Shuttle vectors

ABSTRACT

The invention provides shuttle vectors, and methods of using shuttle vectors, capable of expression in, at least, a mammalian cell. Furthermore, the shuttle vectors are capable of replication in at least yeast, and optionally, bacterial cells. Also provided is a method wherein yeast are transformed with a shuttle vector as provided herein. Heterologous nucleic acids flanked by 5′ and 3′ ends identical to a homologous recombination site within the shuttle vector are introduced to the transformed yeast and allowed to homologously recombine with the shuttle vector such that they are inserted into the vector by the yeast organism. The shuttle vector is then recovered and transferred to a mammalian cell for expression.

This is a continuation of application Ser. No. 09/208,827 filed on Dec.9, 1998 and now U.S. Pat. No. 6,391,582 and for which the Issue Fee waspaid on Dec. 28, 2001, and Ser. No. 09/133,944, filed on Aug. 14, 1998now U.S. Pat. No. 6,280,937 issued on Aug. 28, 2001.

FIELD OF THE INVENTION

The invention relates to novel shuttle vectors, and more particularly,shuttle vectors capable of replication in at least yeast and capable ofexpression in at least a mammalian cell.

BACKGROUND OF THE INVENTION

The introduction of cloned nucleotide sequences into mammalian cells hasgreatly facilitated the study of the control and function of variouseukaryotic genes. Mammalian cells provide an environment conducive toappropriate protein folding, post translational processing, feedbackcontrol, protein-protein interactions, and other eukaryotic proteinmodifications such as glycosylation and oligomerization. Thus, a numberof expression vectors have been developed which allow the expression ofa polypeptide in a mammalian cell.

The typical mammalian expression vector will contain (1) regulatoryelements, usually in the form of a viral promoter or enhancer sequences;(2) a multicloning site, usually having specific enzyme restrictionsites to facilitate the insertion of a DNA fragment with the vector; and(3) sequences responsible for intron splicing and polyadenylation ofmRNA transcripts. Generally, sequences facilitating the replication ofthe vector in both bacterial and mammalian hosts and a selection markergene which allows selection of transformants in bacteria are alsoincluded. The bacterial elements, or in some cases phage elements, areincluded to provide the option of further analyzation of the nucleicacid inserts amplified and isolated from the bacteria or phage.

In the past, the insertion of a heterologous nucleic acid (insert) intothe multicloning site of a mammalian expression vector has generallybeen accomplished by one of two methods. In the first method, the insertis cut out of a bacterial expression vector and ligated into themammalian expression vector. In the second method, often called “TAcloning”, special ends are generated on the insert by PCR such that themodified insert can be put into the mammalian expression vector. Each ofthese methods requires a number of steps including enzymatic reactionswhich can be labor intensive and unreliable. Moreover, cloningefficiency drops significantly as the size of the insert increases.

Another method used for inserting a heterologous nucleic acid (insert)into an expression vector takes advantage of yeast's high efficiency athomologous recombination in vivo. In this method, a nucleic acidfragment flanked by 5′ and 3′ homologous regions is co-introduced into ayeast with a vector which has regions identical to the 5′ and 3′ regionsflanking the fragment. The yeast efficiently homologously recombinessuch that the fragment inserts into the region of the vector flanked bythe before-mentioned 5′ and 3′ regions. H. a., et al., Plasmid, 38:91-96(1997), incorporated herein. Unfortunately, yeast are the only organismsable to efficiently recombine so as to insert heterologous nucleic acidsinto a vector. Therefore, to date, there is not an efficient method ormeans to transfer inserts into a specific region of a vector used forexpression in mammalian cells.

Accordingly, it is an object of the invention to provide compositionsand methods useful in facilitating the insertion of a heterologousnucleic acid into a vector which can express the heterologous nucleicacid in at least a mammalian cell.

Moreover, it is the object of this invention to provide a shuttle vectorand methods of use which allow replication of the shuttle vector atleast in yeast and which allow expression in at least a mammalian cell.

SUMMARY OF THE INVENTION

The invention provides shuttle vectors, and methods of using shuttlevectors, capable of expression in at least a mammalian cell.Furthermore, the shuttle vectors are capable of replication in at leastyeast, and optionally, bacterial cells.

In one aspect of the invention, the invention provides a shuttle vectorcomprising an origin of replication functional in yeast and preferably,a reporter gene functional in yeast. The shuttle vector furthercomprises a promoter functional in a mammalian cell, capable ofdirecting transcription of a polypeptide coding sequence operably linkedto said promoter.

In another aspect of the invention, the shuttle vector comprises aninsertion site operably linked to said promoter. The insertion sitepreferably allows for homologous recombination with a heterologousnucleic acid. In one embodiment, the insertion site has 5′ and 3′regions identical to 5′ and 3′ regions flanking a nucleic acid to beinserted into the vector.

Optionally, the shuttle vector comprises any one or more of thefollowing: an internal ribosome entry sequence (IRES), a polyadenylationsequence and a splice sequence.

In another aspect of the invention, the shuttle vector further comprisesan origin of replication functional in a bacterial cell and preferably,a selectable gene functional in a bacterial cell. The shuttle vector mayalso comprise an origin of replication functional in a mammalian cell,and optionally, a selectable gene functional in a mammalian cell.

The present invention also provides methods for using the shuttlevectors provided herein. In one embodiment, heterologous nucleic acidsflanked by regions identical to flanking regions of the insertion sitewithin a shuttle vector are co-introduced to yeast with the shuttlevector and allowed to homologously recombine such that the heterologousnucleic acids are inserted into the shuttle vector by the yeastorganism. In preferred embodiments, the heterologous nucleic acids areintroduced to the yeast in a linear nucleic acid. The shuttle vector isthen recovered and transferred to a mammalian cell for expression.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic of a shuttle vector in accordance with the presentinvention referred to herein as pPYC-R. The following abbreviations areused herein: IRES for internal ribosomal binding site, GFP for a greenflorescence protein, Amp^(R) for an Ampicillin resistance gene, TRP fora tryptophan gene, and MCS for multi-cloning site or sequence.

FIGS. 2A through 2D show the nucleotide sequence (SEQ ID NO:1) ofpPYC-R. In SEQ ID NO:1, a CMV promoter is at nucleotides 4853-5614, anIRES is at nucleotides 6001-6505, a GFP gene is at nucleotides6506-7258, an Amp^(R) gene is at nucleotides 9888-655, an E. coli originof replication site is at nucleotides 656-1456, a yeast 2μ origin ofreplication is at nucleotides 1461-2808, and a TRP gene is atnucleotides 3344-4018. The intron contains 5′ and 3′ splice sites.

FIG. 3 is a photograph showing the results of a mammalian celltransfection assay using the shuttle vector pPYC-R.

FIG. 4 is a schematic of a shuttle vector in accordance with the presentinvention referred to herein as pPYC.

FIGS. 5A through 5D show the nucleotide sequence (SEQ ID NO:2) of pPYC.In SEQ ID NO:2, a CMV promoter is at nucleotides 1-750, an IRES site isat nucleotides 1158-1662, a GFP gene is at nucleotides 1683-2402, ayeast 2μ origin of replication is at nucleotides 2985-4332, a tryptophangene is at nucleotides 4868-5542, an Ampicillin resistance gene is atnucleotides 5982-6842, and an E. coli origin or replication is atnucleotides 7142-7669. The tag is hemagglutinin (HA).

FIG. 6 is a photograph showing pPYC transfection wherein pPYC includes agene causing apoptosis in accordance with the present invention. Thephotograph shows extensive cell death due to the expression of HA taggedRip.

FIG. 7 is a schematic of a shuttle vector in accordance with the presentinvention referred to herein as pCRU5YMS.

FIGS. 8A through 8C show the nucleotide sequence (SEQ ID NO:5) ofpCRU5YMS. At nucleotides 1-664 is the CMV promoter; at nucleotides2197-2725 is the IRES. At nucleotides 2746-3465 is the GFP gene; atnucleotides 3522-4252 is the LTR; and at nucleotides 4253-5500 is theYeast selection marker TRP gene. At nucleotides 5512-6860 is the Yeast 2micron replication origin; at nucleotides 6861-7650 is the E. colireplication origin and at nucleotides 7678-8538 is the Ampicillinresistance gene.

FIG. 9 is a photograph showing the results of a mammalian celltransfection assay using the shuttle vector pCRU5YMS.

DETAILED DESCRIPTION OF THE INVENTION

The invention provides shuttle vectors, and methods of use, wherein theshuttle vectors are capable of expression in at least a mammalian celland capable of replication in at least yeast. In the past, vectors havebeen constructed so as to be functional in certain aspects either inmammalian cells or yeast, but not both. As described herein, differenthosts provide different advantages to an expression vector and inparticular, to the expression product. By providing a vector which isfunctional as described herein in multiple hosts, the invention allowsthe advantages provided by varying hosts to be obtained by the use of asingle tool.

For example, a vector having the ability to replicate in yeast is usefulfor a variety of reasons. An advantage of the yeast system is itsefficiency at homologous recombination. Orr-Weaver, et al., PNAS USA,80:4417-4421 (1983), incorporated herein by reference. By takingadvantage of yeast's ability to insert heterologous nucleic acids into avector, this eliminates the steps of manipulating the ends of the vectorand the heterologous nucleic acid and ligating the two together. Anotheradvantage of this system is that yeast can be transformed with largenucleic acids, i.e., up to at least 10 kilobases, which can then beinserted into the vector.

Moreover, yeast is a well-studied organism which facilitates its use. Inparticular, yeast has been widely used to detect protein-proteininteractions in the “two-hybrid system”. The two-hybrid system is amethod used to identify and clone genes for proteins that interact witha protein of interest. Briefly, the system indicates protein-proteininteraction by the reconstitution of GAL4 function, which is detectableand only occurs when the proteins interact. This system and generalmethodologies concerning the transformation of yeast with expressiblevectors are described in Cheng-Ting et al., PNAS USA, 88:9578-9582(1991), Fields and Song, Nature, 340:245-246 (1989), and Chevray andNathans, PNAS USA, 89:5789-5793 (1992), each incorporated herein intheir entirety.

Regarding mammalian cells, these cells are preferred for the expressionof eukaryotic proteins particularly when determining or studying thefunction of the protein. Mammalian cells are able to reproduce theprotein's proper glycosylation and oligomerization, folding, posttranslational processing, feedback control, protein-protein interaction,etc., and thus are advantageous for expression of eukaryotic andparticularly mammalian proteins.

Regarding bacterial cells or phage, these systems are also very wellstudied and are therefore easily manipulated. In particular, bacteriaand phage are useful for the rapid amplification of nucleic acids.

Thus, while vectors which function in one of yeast, mammalian cells,bacteria or phage are useful, vectors which can successfully shuttlebetween these systems are particularly desirable. This inventionprovides such vectors. In a preferred embodiment, the shuttle vectorfunctions in both yeast and mammalian cells. Such a shuttle vector canallow for the exploitation of the yeast two-hybrid system as well asyeast's ability to homologously recombine, as well as provide aconvenient means for subsequent expression in mammalian cells to, forexample, verify protein-protein interactions, study the protein'sfunction, etc.

In one embodiment, the invention provides a shuttle vector comprising anorigin of replication functional in yeast and a promoter functional in amammalian cell. Preferably, the shuttle vector also comprises aselectable gene functional in yeast.

The origin of replication functional in yeast is any nucleic acidsequence which allows replication of the shuttle vector independentlyfrom the chromosome. Generally, the origin of replication is functionalin at least one or more of the following: Saccharomyces cerevisiae,Candida albicans and C. maltosa, Hansenula polymorpha, Kluyveromycesfragilis and K. lactis, Pichia guillerimondii and P. pastoris,Schizosaccharomyces pombe, and Yarrowia lipolytica. Suitable origin ofreplication sites include, for example, ars 1, centromere ori, and 2μori. Yeast origin of replication sites can be used to increase the copynumber and to retrieve the vector from yeast.

The “promoter functional in a mammalian cell” or “mammalian promoter” iscapable of directing transcription of a polypeptide coding sequenceoperably linked to said promoter. The choice of the promoter will dependin part on the mammalian cell into which the vector is put. Generally,this promoter is functional in at least one or more of the following:Chinese hamster ovary (CHO), BHK, 293, Hela, NH3T3 and COS cells. Morespecific examples include monkey kidney CV1 line; human embryonic kidneyline (293 or 293 cells subcloned for growth in suspension culture,Graham et al., J. Gen Virol., 36:59 (1977)); Chinese hamster ovarycells/-DHFR (CHO, Urlaub and Chasin, Proc. Natl. Acad. Sci. USA, 77:4216(1980)); mouse sertoli cells (TM4, Mather, Biol. Reprod., 23:243-251(1980)); human lung cells (W138, ATCC CCL 75); human liver cells (HepG2, HB 8065); and mouse mammary tumor (MMT 060562, ATCC CCL51). Theselection of the appropriate host cell is deemed to be within the skillin the art.

Transcription from vectors in mammalian host cells is controlled, forexample, by promoters obtained from the genomes of viruses such asadenoviruses, retroviruses, lentiviruses, herpes viruses, including butnot limited to, polyoma virus, fowlpox virus (UK 2,211,504 publishedJul. 5, 1989), adenovirus 2, bovine papilloma virus, avian sarcomavirus, cytomegalovirus (CMV), hepatitis-B virus, Simian Virus 40 (SV40),Epstein Barr virus (EBV), feline immunedeficiency virus (FIV), and Srα,or are respiratory synsitial viral promoters (RSV) or long terminalrepeats (LTRs) of a retrovirus, i.e., a Moloney Murine Leukemia Virus(MoMuLv) (Cepko et al. (1984) Cell 37:1053-1062). Moreover, thepromoters can be selected from heterologous mammalian promoters, e.g.,the actin promoter or an immunoglobulin promoter, and from heat-shockpromoters, and functional derivatives thereof, provided such promotersare compatible with the host cell systems. The promoter functional in amammalian cell can be inducible or constitutive.

In an embodiment provided herein, the shuttle vector is double strandedand on a first strand, comprises a first promoter operably linked toeither a coding sequence or a site for the insertion of a codingsequence of interest (i.e., a heterologous nucleic acid) followed by apolyadenylation site. On a second strand, the shuttle vector comprisestwo LTRs flanking said region comprising said first promoter and codingsequence or cloning site, wherein the LTRs operate in a directionopposite to said first promoter.

“Operably linked” as used herein means that the transcriptional andtranslational regulatory nucleic acid is positioned relative to anycoding sequences in such a manner that transcription is initiated.Generally, this will mean that the promoter and transcriptionalinitiation or start sequences are positioned 5′ to the coding region.The transcriptional and translational regulatory nucleic acid willgenerally be appropriate to the host cell used, as will be appreciatedby those in the art.

By “vector” or “episome” herein is meant a nucleic acid replicon usedfor the transformation of host cells. The vectors may be eitherself-replicating extrachromosomal vectors or vectors which integrateinto a mammalian host genome, such as a retroviral based vector. In apreferred embodiment, the shuttle vector remains as an extrachromosomalvector in bacteria and yeast, and is integrated into the genome of themammalian cell.

A preferred embodiment utilizes retroviral desired vectors. Currently,the most efficient gene transfer methodologies harness the capacity ofengineered viruses, such as retroviruses, to bypass natural cellularbarriers to exogenous nucleic acid uptake. The use of recombinantretroviruses was pioneered by Richard Mulligan and David Baltimore withthe Psi-2 lines and analogous retrovirus packaging systems, based on NIH3T3 cells (see Mann et al., Cell 33:153-159 (1993), hereby incorporatedby reference). Such helper-defective packaging lines are capable ofproducing all the necessary trans proteins -gag, pol, and env-that arerequired for packaging, processing, reverse transcription, andintegration of recombinant genomes. Those RNA molecules that have in cisthe ψ packaging signal are packaged into maturing virions. In addition,transfection efficiencies of retroviruses can be extremely high, thusobviating the need for selection genes in some cases.

Retroviral transfection systems are further described in Mann et al.,supra: Pear et al., PNAS USA 90(18):8392-6 (1993); Kitamura et al., PNASUSA 92:9146-9150 (1995); Kinsella et al., Human Gene Therapy7:1405-1413; Hofmann et al., PNAS USA 93:5185-5190; Choate et al., HumanGene Therapy 7:2247 (1996); WO 94/19478; PCT U.S.97/01019, andreferences cited therein, all of which are incorporated by reference.

Any number of suitable retroviral vectors may be used to construct theshuttle vectors of the invention. Preferred retroviral vectors include avector based on the murine stem cell virus (MSCV) (see Hawley et al.,Gene Therapy 1:136 (1994)) and a modified MFG virus (Rivere et al.,Genetics 92:6733 (1995)), and pBABE (see PCT U.S.97/01019, incorporatedby reference), and functional derivatives thereof.

In addition, it is possible to configure a retroviral vector to allowexpression of genes after integration in target cells. For example,Tet-inducible retroviruses can be used to express genes (Hoffman et al.,PNAS USA 93:5185 (1996)). Expression of this vector in cells isvirtually undetectable in the presence of tetracycline or other activeanalogs. However, in the absence of Tet, expression is turned on tomaximum within 48 hours after induction, with uniform increasedexpression of the whole population of cells that harbor the inducibleretrovirus, indicating that expression is regulated uniformly within theinfected cell population.

The shuttle vector can also be based on a non-retroviral vector. Anynumber of known vectors are suitable, including, but not limited to,pREP9, pCDNA, pCEP4 (Invitrogen), pCI and pCI-NEO (Promega). Basically,any vector can be reconstructed to contain the components as describedherein. For example, construction of suitable vectors containing thecomponents described herein can be achieved by employing standardligation techniques which are known to the skilled artisan, using clonedor synthetic sequences.

In a preferred embodiment, the shuttle vector includes a selectable genefunctional in yeast (also referred to herein as a yeast reporter gene).By “selectable gene” or “reporter gene” herein is meant a gene that byits presence in a host cell, i.e. upon expression, can allow the host tobe distinguished from a cell that does not contain the selectable gene.Selectable genes can be classified into several different types,including survival and detection genes. It may be the nucleic acid orthe protein expression product that causes the effect. Additionalcomponents, such as substrates, ligands, etc., may be additionally addedto allow selection or sorting on the basis of the selectable gene.

In a preferred embodiment, the selectable gene is a survival gene thatserves to provide a nucleic acid (or encode a protein) without which thecell cannot survive, such as a drug resistant gene, a growth regulatorygene, or a nutritional requirement. The selectable gene functional inyeast is preferably a survival gene. Wherein a selectable genefunctional in bacteria is included in the shuttle vector, a survivalgene is also preferred.

Preferred survival genes functional in yeast are survival genes whichinclude ADE2, HIS3, LEU2, TRP1, URA3, and ALG7, which confer resistanceto tunicamycin; the neomycin phosphotransferase gene, which confersresistance to G418; the CUP1 gene, which allows yeast to grow in thepresence of copper ions; and an adenine producing gene, or the like,which may be used alone or in combinations of two or more thereof. In apreferred embodiment, the trp1 gene is utilized. Stinchcomb et al.,Nature, 282:39 (1979); Kingsman et al., Gene, 7:141 (1979); Tschemper etal., Gene, 10:157 (1980). The trp1 gene provides a selection marker fora mutant strain of yeast lacking the ability to grow in tryptophan, forexample, ATCC No. 44076 or PEP4-1 [Jones, Genetics, 85:12 (1977)]. Thepreferred selectable gene functional in bacteria is a drug resistant,such as an ampicillin resistant gene.

In a preferred embodiment, the selectable gene is a detection gene.Wherein a selectable gene functional in mammalian cells is included inthe vector, a detection gene is preferred. Detection genes encode aprotein that can be used as a direct or indirect label, i.e., forsorting the cells, i.e. for cell enrichment by FACS. In this embodiment,the protein product of the selectable gene itself can serve todistinguish cells that are expressing the selectable gene. In thisembodiment, suitable selectable genes include those encoding greenfluorescent protein (GFP), blue fluorescent protein (BFP), yellowfluorescent protein (YFP), red fluorescent protein (RFP), luciferase,β-galactosidase, all commercially available, i.e., Clontech, Inc.

Alternatively, the selectable gene encodes a protein that will bind alabel that can be used as the basis of selection; i.e. the selectablegene serves as an indirect label or detection gene. In this embodiment,the selectable gene should encode a cell-surface protein. For example,the selectable gene may be any cell-surface protein not normallyexpressed on the surface of the cell, such that secondary binding agentscould serve to distinguish cells that contain the selectable gene fromthose that do not. Alternatively, albeit non-preferably, selectablescomprising normally expressed cell-surface proteins could be used, anddifferences between cells containing the selectable construct and thosewithout could be determined. Thus, secondary binding agents bind to theselectable protein. These secondary binding agents are preferablylabeled, for example with fluors, and can be antibodies, haptens, etc.For example, fluorescently labeled antibodies to the selectable gene canbe used as the label. Similarly, membrane-tethered streptavidin couldserve as a selectable gene, and fluorescent biotin could be used as thelabel, i.e. the secondary binding agent. Alternatively, the secondarybinding agents need not be labeled as long as the secondary bindingagent can be used to distinguish the cells containing the construct; forexample, the secondary binding agents may be used in a column, and thecells passed through, such that the expression of the selectable generesults in the cell being bound to the column, and a lack of theselectable gene (i.e. inhibition), results in the cells not beingretained on the column. Other suitable selectable proteins/secondarylabels include, but are not limited to, antigens and antibodies, enzymesand substrates (or inhibitors), etc.

In one aspect of the invention, the shuttle vector includes an insertionsite, which is used to insert a heterologous nucleic acid sequence ofchoice, for ultimate expression in mammalian cells. The insertion sitecan be either be a cloning site, preferably a multicloning site (MCS),or a site suitable for homologous recombination, (referred to herein asa homologous recombination site). The vector can include multipleinsertion sites, including both cloning sites and at least onehomologous recombination site.

In a preferred embodiment, the insertion site is a cloning site. Acloning site as used herein is a known sequence, preferably the only oneon the vector, (i.e., it is a unique sequence on the vector) upon whicha restriction enzyme operates to linearize or cut the vector. Amulticloning site, also sometimes referred to as a multiple cloningsite, polylinker, or polycloning site, is a cluster of cloning sitessuch that many restriction enzymes operate thereon. A wide variety ofthese sites are known in the art.

In a preferred embodiment, the insertion site is a site that allows theintroduction of the heterologous nucleic acid into the shuttle vector byhomologous recombination. Homologous recombination is, briefly, theprocess of strand exchange that can occur spontaneously with thealignment of homologous sequences (i.e. sets of complementary strands).As is known in the art, yeast are efficient at homologous recombination.Orr-Weaver, et al, supra; H. a., et al., supra; Ma, et al., Gene,58:201-216 (1987); Petermann, Nucleic Acids Res., 26(9):2252-2253(1998); each incorporated herein by reference. Thus, in general, thehomologous recombination site contains two distinct, but generallycontiguous, regions. The first region, referred to herein as the 5′region, is generally identical to the 5′ region flanking theheterologous nucleic acid to be inserted into the vector. The secondregion, referred to herein as the 3′ region, is generally identical tothe 3′ region flanking the heterologous nucleic acid to be inserted intothe vector. Preferably, the 5′ and 3′ regions are each at least 12 or 15nucleic acids long. More preferably, the 5′ and 3′ regions are each atleast about 20 or 30 nucleic acids long, and more preferably at leastabout 50 nucleic acids long, and most preferably about 60 nucleic acidslong. These regions are preferably less than about 100 nucleic acidslong. Preferably, the homologous recombination site sequence is uniqueto the vector in that the vector does not comprise another sequencecorresponding to the sequence of the homologous recombination site.

The insertion site is used to insert a heterologous nucleic acid. A“heterologous nucleic acid” as used herein refers to any nucleic acidinserted into the shuttle vector at a site operably linked to thepromoter. Various embodiments of heterologous nucleic acids are furtherdefined below. In a preferred embodiment, the heterologous nucleic acidis flanked by 5′ and 3′ regions identical to the 5′ and 3′ regions of ahomologous recombination site on the shuttle vector provided herein.Thus, when the heterologous nucleic acid is inserted into the vector,the 5′ and 3′ regions flanking the heterologous nucleic acid replace the5′ and 3′ regions of the homologous recombination site during homologousrecombination.

In a further aspect, the shuttle vector further comprises an origin ofreplication flunctional in a bacterial cell. The bacterial cell isgenerally any bacterial cell which can be used to amplify the shuttlevector. Examples include Gram-negative or Gram-positive organisms, forexample, Enterobacteriaceae such as E. coli, Bacillus subtilis,Streptococcus cremoris, Streptococcus lividans. Various E. coli strainsare publicly available, such as E. coli K12 strain MM294 (ATCC 31,446);E. coli X1776 (ATCC 31,537); E. coli strain W3110 (ATCC 27,325) and K5772 (ATCC 53,635). Origin of replication sites are known in the art andare further described in Sambrook, et al., Molecular Cloning, 2nd Ed.,Vol. 3, Chapter 1, particularly sections 12-20 (1989), Promega, 1998catalog number E1841 (pCI-neo).

In one embodiment, the shuttle vector also comprises an origin ofreplication functional in mammalian cells. As is known in the art, theonly extrachromosomal vectors which replicate in mammalian cells arevirally derived. A number of viral origin of replications require thebinding of a specific viral replication protein to effect replication.Suitable origin of replication/viral replication protein pairs include,but are not limited to, the Epstein Barr origin of replication and theEpstein Barr nuclear antigen (see Sugden et al., Mole. Cell. Biol.5(2):410-413 (1985)); the SV40 origin of replication and the SV40 Tantigen (see Margolskee et al., Mole. Cell. Biol. 8(7):2837 (1988)). Thecoding sequence for the viral replication protein can be on the shuttlevector provided herein, or on a separate vector.

In an additional aspect of this invention, the shuttle vector comprisesadditional sequences, including but not limited to at least one or allof the following: an internal ribosome entry sequence (IRES), an RNAsplice site (also called a splice signal or sequence herein) and apolyadenylation site (also called a polyadenylation signal or sequenceherein).

IRES elements function as initiators of the efficient translation ofreading frames. In particular, IRES allows for the translation of twodifferent genes on a single transcript. IRES thus greatly facilitatesthe selection of cells expressing peptides at uniformly high levels.IRES elements are known in the art and are further characterized in Kim,et al., Molecular and Cellular Biology 12(8):3636-3643 (August 1992) andMcBratney, et al., Current Opinion in Cell Biology 5:961-965 (1993).

All of those sequences of viral, cellular, or synthetic origin whichmediate an internal binding of the ribosomes can be used as an IRES.Examples include those IRES elements from poliovirus Type I, the 5′UTRof encephalomyocarditis virus (EMV), of “Thelier's murineencephalomyelitis virus (TMEV) of “foot and mouth disease virus” (FMDV)of “bovine enterovirus (BEV), of “coxsackie B virus” (CBV), or of “humanrhinovirus” (HRV), or the “human immunoglobulin heavy chain bindingprotein” (BIP) 5′UTR, the Drosophila antennapediae 5′UTR or theDrosophila ultrabithorax 5′UTR, or genetic hybrids or fragments from theabove-listed sequences.

The shuttle vectors provided herein may include a splice donor andacceptor site (splicing signals or splice sites) within thetranscription unit. Splicing signals are known to increase mRNAstability and protein expression levels. Splicing signals are known inthe art and are further described in Sambrook, et al., MolecularCloning, 2nd Ed., Vol. 3, Chapter 16, particularly section 7 (1989).

A polyadenylation site or signal refers to sequences necessary for thetermination of transcription and for stabilizing the mRNA of eukaryotes.Such sequences are commonly available and are further described inSambrook, et al., Molecular Cloning, 2nd Ed., Vol. 3, Chapter 16,particularly sections 6-7 (1989).

Optionally, the shuttle vector may further comprise transcriptionenhancers. Enhancers are cis-acting elements of DNA, usually about from10 to 300 bp, that act on a promoter to increase its transcription. Manyenhancer sequences are now known from mammalian genes including forexample, globin, elastase, albumin, α-fetoprotein, and insulin.Typically, however, one will use an enhancer from a eukaryotic cellvirus. Examples include the SV40 enhancer on the late side of thereplication origin (bp 100-270), the cytomegalovirus early promoterenhancer, the polyoma enhancer on the late side of the replicationorigin, and adenovirus enhancers. The enhancer may be spliced into thevector at a position 5′ or 3′ to a coding sequence, but is preferablylocated at a site 5′ from the promoter.

Optionally, the vector can be constructed so as to allow of theheterologous nucleic acid expression in yeast and/or bacterial cells. Inthis embodiment, the vector would further include a promoter functionalin yeast and/or bacterial cells. Examples of suitable promotingsequences for use with yeast hosts include the promoters for3-phosphoglycerate kinase [Hitzeman et al., J. Biol. Chem., 255:2073(1980)] or other glycolytic enzymes [Hess et al., J. Adv. Enzyme Reg.,7:149 (1968); Holland, Biochemistry, 17:4900 (1978)], such as enolase,glyceraldehyde-3-phosphate dehydrogenase, hexokinase, pyruvatedecarboxylase, phosphofructokinase, glucose-6-phosphate isomerase,3-phosphoglycerate mutase, pyruvate kinase, triosephosphate isomerase,phosphoglucose isomerase, and glucokinase. Other yeast promoters, whichare inducible promoters having the additional advantage of transcriptioncontrolled by growth conditions, include the promoter regions foralcohol dehydrogenase 2, isocytochrome C, acid phosphatase, degradativeenzymes associated with nitrogen metabolism, metallothionein,glyceraldehyde-3-phosphate dehydrogenase, and enzymes responsible formaltose and galactose utilization.

Promoters for bacterial cells are known in the art and further describedi.e., in Sambrook, et al., Molecular Cloning, 2nd Ed., Vol. 3, Chapter17, particularly sections 11-17 (1989). Generally, promoters suitablefor use with prokaryotic hosts include the β-lactamase and lactosepromoter systems [Chang et al., Nature, 275:615 (1978); Goeddel et al.,Nature, 281:544 (1979)], alkaline phosphatase, a tryptophan (trp)promoter system [Goeddel, Nucleic Acids Res., 8:4057 (1980); EP 36,776],and hybrid promoters such as the tac promoter [deBoer et al., Proc.Natl. Acad. Sci. USA, 80:21-25 (1983)]. Promoters for use in bacterialsystems also will contain a Shine-Dalgarno (S.D.) sequence.

In an embodiment provided herein, the insertion site is linked to aselection system, i.e., a detection gene. In a preferred embodiment,from the 5′ to 3′ direction the construct comprises the mammalianpromoter, the heterologous nucleic acid, the IRES site, and theselectable gene.

In a preferred embodiment, the vectors are used to screen heterologousnucleic acids. “Heterologous nucleic acids” as used herein refers tonaturally occurring nucleic acids, random nucleic acids, or “biased”random nucleic acids, e.g. in nucleotide/residue frequency generally orper position. By “randomized” or grammatical equivalents herein is meantthat each nucleic acid consists of essentially random nucleotides. Forexample, digests of procaryotic or eukaryotic genomes may be used, orcDNA fragments. They are heterologous in that they are inserted into theshuttle vector.

In a preferred embodiment, the heterologous nucleic acids are presentedto the shuttle vector in the form of a cloning vector wherein theheterologous nucleic acid is flanked by 5′ and 3′ regions identical to5′ and 3′ regions of an insertion site (i.e., a homologous recombinationsite) on the shuttle vector. That is, heterologous nucleic acids arerecombined into cloning vectors containing homologous recombinationflanking regions. The cloning vectors and the shuttle vectors areintroduced into yeast, where recombination takes place. In a preferredembodiment, the cloning vectors are linear when introduced to the yeast.

In one aspect of the invention, the shuttle vectors provided herein areused to transform yeast. Heterologous nucleic acids are then, orsimultaneously introduced to the yeast, and in a preferred embodiment,homologous recombination takes place such that the yeast inserts theheterologous nucleic acid into the shuttle vector at a specificinsertion site, i.e., a homologous recombination site.

Transformations into yeast are typically carried out according to themethod of Van Solingen et al., J. Bact., 130:946 (1977) and Hsiao etal., Proc. Natl. Acad. Sci., (USA), 76:3829 (1979). The shuttle vectorsare then isolated from the yeast and used to transform mammalian cellsfor expression of the heterologous nucleic acid.

For transforming mammalian cells, the calcium phosphate precipitationmethod of Graham and van der Eb, Virology, 52:456-457 (1978) can beemployed. General aspects of mammalian cell host system transformationshave been described in U.S. Pat. No. 4,399,216. However, other methodsfor introducing DNA into cells, such as by nuclear microinjection,biolistics, electroporation, bacterial protoplast fusion with intactcells, or polycations, e.g., polybrene, polyornithine, may also be used.For various techniques for transforming mammalian cells, see Keown etal., Methods in Enzymology, 185:527-537 (1990) and Mansour et al.,Nature, 336:348-352 (1988). Expression in mammalian cells is alsodescribed in Sambrook, et al., Molecular Cloning, 2nd Ed., Vol. 3,Chapter 16, particularly sections 68-72 (1989).

Isolation of the shuttle vectors is performed by standard techniquesknown in the art. Generally, the shuttle vectors can be isolated bybreaking the cell open and separating the vector nucleic acid based onweight, i.e., centrifugation, or size, i.e. gel permeability. Thevectors need only be isolated to the extent required to performtransformation.

In one aspect of this invention, the invention involves expression ofheterologous nucleic acid inserts in a mammalian cell population. Theexpression of heterologous nucleic acids is identified by the productionof a label or tag. Thus, when the shuttle vector expresses aheterologous nucleic acid, a selectable gene will also be expressedthereby verifying the presence of an expressed heterologous nucleicacid.

In another aspect of the present invention, expressed heterologousnucleic acids are selected on the basis of activity or phenotype. Forexample, the expressed insert or the cell type expressing thatparticular insert can be screened for its ability to interact with anantibody or ligand, capable of specific binding to the encoded productof that insert, which has been previously bound to a solid support suchas a petri dish. Positive cDNA inserts (those expressed in cell typesbinding to the solid support) are recovered, transformed into aconvenient host (E.coli) and characterized by known recombinant DNAtechniques. This procedure is also referred to as panning, and isfurther described in Wysocki and Sata, 1979 PNAS 75:2844-2848 and Seenand Aruffo, 1987 PNAS 84:3365-3369.

Thus, in one embodiment, the present invention allows for creatingshuttle vectors with inserts therein, without necessarily requiring theskilled artisan to insert the heterologous nucleic acid into the shuttlevector. Rather, the invention herein provides for the yeast organism toperform this step in a preferred embodiment. Moreover, the presentinvention also allows for expression in mammalian cells, which providesfor a native environment for expressing mammalian genes. Additionally,the invention provides for a variety of options, such as replication inbacteria for amplification of shuttle vectors containing selectedheterologous nucleic acids. Moreover, the shuttle vectors providedherein can perform the traditional aspects of expression vectors,whether or not “shuttling” is desired.

Furthermore, the present invention provides for screening forheterologous nucleic acids which encode candidate agents. “Candidateagents” as used herein are peptides which may have a desired effect onthe phenotype or genotype of a cell. Heterologous nucleic acidsexpressing a candidate agent can be designed in a number of ways so asto facilitate their identification. Generally, this is achieved by theuse of fusion partners, or combinations of fusion partners. Examplesinclude presentation structures, targeting sequences, rescue sequences,and stability sequences, all of which can be used independently or incombination, with or without linker sequences.

By “fusion partner” or “functional group” herein is meant a sequencethat is associated with the heterologous nucleic acid expressing acandidate agent, that confers upon all members of the library in thatclass a common function or ability. Fusion partners can be heterologous(i.e. not native to the host cell), or synthetic (not native to anycell). Suitable fusion partners include, but are not limited to: a)presentation structures, as defined below, which provide the candidatebioactive agents in a conformationally restricted or stable form; b)targeting sequences, defined below, which allow the localization of thecandidate bioactive agent into a subcellular or extracellularcompartment; c) rescue sequences as defined below, which allow thepurification or isolation of either the candidate bioactive agents orthe nucleic acids encoding them; d) stability sequences, which conferstability or protection from degradation to the candidate bioactiveagent or the nucleic acid encoding it, for example resistance toproteolytic degradation; e) dimerization sequences, to allow for peptidedimerization; or f) any combination of a), b), c), d), and e), as wellas linker sequences as needed.

In a preferred embodiment, the fusion partner is a presentationstructure. By “presentation structure” or grammatical equivalents hereinis meant a sequence, which, when fused to a heterologous nucleic acidexpressing a candidate agent, causes the candidate agents to assume aconformationally restricted form. Proteins interact with each otherlargely through conformationally constrained domains. Although smallpeptides with freely rotating amino and carboxyl termini can have potentfunctions as is known in the art, the conversion of such peptidestructures into pharmacologic agents is difficult due to the inabilityto predict side-chain positions for peptidomimetic synthesis. Thereforethe presentation of peptides in conformationally constrained structureswill benefit both the later generation of pharmaceuticals and will alsolikely lead to higher affinity interactions of the peptide with thetarget protein. This fact has been recognized in the combinatoriallibrary generation systems using biologically generated short peptidesin bacterial phage systems. A number of workers have constructed smalldomain molecules in which one might present randomized peptidestructures.

Suitable presentation structures include, but are not limited to,minibody structures, loops on beta-sheet turns and coiled-coil stemstructures in which residues not critical to structure are randomized,zinc-finger domains, cysteine-linked (disulfide) structures,transglutaminase linked structures, cyclic peptides, B-loop structures,helical barrels or bundles, leucine zipper motifs, etc.

In a preferred embodiment, the presentation structure is a coiled-coilstructure, allowing the presentation of the randomized peptide on anexterior loop. See, for example, Myszka et al., Biochem. 33:2362-2373(1994), hereby incorporated by reference). Using this systeminvestigators have isolated peptides capable of high affinityinteraction with the appropriate target. In general, coiled-coilstructures allow for between 6 to 20 randomized positions.

A preferred coiled-coil presentation structure is as follows:MGCAALESEVSALESEVAS LE SEVAALGRGDMPLAAVKS KL SAVKSKLAS VKSKLAACGPP (SEQID NO:6). The underlined regions represent a coiled-coil leucine zipperregion defined previously (see Martin et al., EMBO J. 13(22):5303-5309(1994), incorporated by reference). The bolded GRGDMP region representsthe loop structure and when appropriately replaced with randomizedpeptides (i.e., candidate bioactive agents, generally depicted herein as(X)_(n), where X is an amino acid residue and n is an integer of atleast 5 or 6) can be of variable length. The replacement of the boldedregion is facilitated by encoding restriction endonuclease sites in theunderlined regions, which allows the direct incorporation of randomizedoligonucleotides at these positions. For example, a preferred embodimentgenerates a XhoI site at the double underlined LE site and a HindIIIsite at the double-underlined KL site.

In a preferred embodiment, the presentation structure is a minibodystructure. A “minibody” is essentially composed of a minimal antibodycomplementarity region. The minibody presentation structure generallyprovides two randomizing regions that in the folded protein arepresented along a single face of the tertiary structure. See for exampleBianchi et al., J. Mol. Biol. 236(2):649-59 (1994), and references citedtherein, all of which are incorporated by reference). Investigators haveshown this minimal domain is stable in solution and have used phageselection systems in combinatorial libraries to select minibodies withpeptide regions exhibiting high affinity, Kd=10⁻⁷, for thepro-inflammatory cytokine IL-6.

A preferred minibody presentation structure is as follows:MGRNSQATSGFTFSHFYMEWVRGGEYIAASRHKHNKYTTEYSASVKGR YIVSRDTSQSILYLQKKKGPP(SEQ ID NO:7). The bold, underline regions are the regions which may berandomized. The italicized phenylalanine must be invariant in the firstrandomizing region. The entire peptide is cloned in athree-oligonucleotide variation of the coiled-coil embodiment, thusallowing two different randomizing regions to be incorporatedsimultaneously. This embodiment utilizes non-palindromic BstXI sites onthe termini.

In a preferred embodiment, the presentation structure is a sequence thatcontains generally two cysteine residues, such that a disulfide bond maybe formed, resulting in a conformationally constrained sequence. Thisembodiment is particularly preferred when secretory targeting sequencesare used. As will be appreciated by those in the art, any number ofrandom sequences, with or without spacer or linking sequences, may beflanked with cysteine residues. In other embodiments, effectivepresentation structures may be generated by the random regionsthemselves. For example, the random regions may be “doped” with cysteineresidues which, under the appropriate redox conditions, may result inhighly crosslinked structured conformations, similar to a presentationstructure. Similarly, the randomization regions may be controlled tocontain a certain number of residues to confer β-sheet or α-helicalstructures.

In a preferred embodiment, the fusion partner is a targeting sequence.As will be appreciated by those in the art, the localization of proteinswithin a cell is a simple method for increasing effective concentrationand determining function. For example, RAF1 when localized to themitochondrial membrane can inhibit the anti-apoptotic effect of BCL-2.Similarly, membrane bound Sos induces Ras mediated signaling inT-lymphocytes. These mechanisms are thought to rely on the principle oflimiting the search space for ligands, that is to say, the localizationof a protein to the plasma membrane limits the search for its ligand tothat limited dimensional space near the membrane as opposed to the threedimensional space of the cytoplasm. Alternatively, the concentration ofa protein can also be simply increased by nature of the localization.Shuttling the proteins into the nucleus confines them to a smaller spacethereby increasing concentration. Finally, the ligand or target maysimply be localized to a specific compartment, and inhibitors must belocalized appropriately.

Thus, suitable targeting sequences include, but are not limited to,binding sequences capable of causing binding of the expression productto a predetermined molecule or class of molecules while retainingbioactivity of the expression product, (for example by using enzymeinhibitor or substrate sequences to target a class of relevant enzymes);sequences signalling selective degradation, of itself or co-boundproteins; and signal sequences capable of constitutively localizing thecandidate expression products to a predetermined cellular locale,including a) subcellular locations such as the Golgi, endoplasmicreticulum, nucleus, nucleoli, nuclear membrane, mitochondria,chloroplast, secretory vesicles, lysosome, and cellular membrane; and b)extracellular locations via a secretory signal. Particularly preferredis localization to either subcellular locations or to the outside of thecell via secretion.

In a preferred embodiment, the targeting sequence is a nuclearlocalization signal (NLS). NLSs are generally short, positively charged(basic) domains that serve to direct the entire protein in which theyoccur to the cell's nucleus. Numerous NLS amino acid sequences have beenreported including single basic NLSs such as that of the SV40 (monkeyvirus) large T Antigen (Pro Lys Lys Lys Arg Lys Val) (SEQ ID NO:8),Kalderon (1984), et al., Cell, 39:499-509; the human retinoic acidreceptor-β nuclear localization signal (ARRRRP) (SEQ ID NO:9); NFκB p50(EEVQRKRQKL (SEQ ID NO:10); Ghosh et al., Cell 62:1019 (1990)); NFκB p65(EEKRKRTYE (SEQ ID NO:11); Nolan et al., Cell 64:961 (1991)); and others(see for example Boulikas, J. Cell. Biochem. 55(1):32-58 (1994), herebyincorporated by reference) and double basic NLSs exemplified by that ofthe Xenopus (African clawed toad) protein, nucleoplasmin (Ala Val LysArg Pro Ala Ala Thr Lys Lys Ala Gly Gin Ala Lys Lys Lys Lys Leu Asp)(SEQ ID NO: 12), Dingwall, et al., Cell, 30:449-458, 1982 and Dingwall,et al., J. Cell Biol., 107:641-849; (1988). Numerous localizationstudies have demonstrated that NLSs incorporated in synthetic peptidesor grafted onto selectable proteins not normally targeted to the cellnucleus cause these peptides and selectable proteins to be concentratedin the nucleus. See, for example, Dingwall, and Laskey, Ann, Rev. CellBiol., 2:367-390, 1986; Bonnerot, et al., Proc. Natl. Acad. Sci. USA,84:6795-6799, 1987; Galileo, et al., Proc. Natl. Acad. Sci. USA,87:458-462, 1990.

In a preferred embodiment, the targeting sequence is a membraneanchoring signal sequence. This is particularly useful since manyparasites and pathogens bind to the membrane, in addition to the factthat many intracellular events originate at the plasma membrane. Thus,membrane-bound peptide libraries are useful for both the identificationof important elements in these processes as well as for the discovery ofeffective inhibitors. The invention provides methods for presenting therandomized expression product extracellularly or in the cytoplasmicspace. For extracellular presentation, a membrane anchoring region isprovided at the carboxyl terminus of the peptide presentation structure.The randomized epression product region is expressed on the cell surfaceand presented to the extracellular space, such that it can bind to othersurface molecules (affecting their function) or molecules present in theextracellular medium. The binding of such molecules could conferfunction on the cells expressing a peptide that binds the molecule. Thecytoplasmic region could be neutral or could contain a domain that, whenthe extracellular randomized expression product region is bound, confersa function on the cells (activation of a kinase, phosphatase, binding ofother cellular components to effect function). Similarly, the randomizedexpression product-containing region could be contained within acytoplasmic region, and the transmembrane region and extracellularregion remain constant or have a defined function.

Membrane-anchoring sequences are well known in the art and are based onthe genetic geometry of mammalian transmembrane molecules. Peptides areinserted into the membrane based on a signal sequence (designated hereinas ssTM) and require a hydrophobic transmembrane domain (herein TM). Thetransmembrane proteins are inserted into the membrane such that theregions encoded 5′ of the transmembrane domain are extracellular and thesequences 3′ become intracellular. Of course, if these transmembranedomains are placed 5′ of the variable region, they will serve to anchorit as an intracellular domain, which may be desirable in someembodiments. ssTMs and TMs are known for a wide variety of membranebound proteins, and these sequences may be used accordingly, either aspairs from a particular protein or with each component being taken froma different protein, or alternatively, the sequences may be synthetic,and derived entirely from consensus as artificial delivery domains.

As will be appreciated by those in the art, membrane-anchoringsequences, including both ssTM and TM, are known for a wide variety ofproteins and any of these may be used. Particularly preferredmembrane-anchoring sequences include, but are not limited to, thosederived from CD8, ICAM-2, IL-8R, CD4 and LFA-1.

Useful sequences include sequences from: 1) class I integral membraneproteins such as IL-2 receptor beta-chain (residues 1-26 are the signalsequence, 241-265 are the transmembrane residues; see Hatakeyama et al.,Science 244:551 (1989) and von Heijne et al, Eur. J. Biochem. 174:671(1988)) and insulin receptor beta chain (residues 1-27 are the signal,957-959 are the transmembrane domain and 960-1382 are the cytoplasmicdomain; see Hatakeyama, supra, and Ebina et al., Cell 40:747 (1985)); 2)class II integral membrane proteins such as neutral endopeptidase(residues 29-51 are the transmembrane domain, 2-28 are the cytoplasmicdomain; see Malfroy et al., Biochem. Biophys. Res. Commun. 144:59(1987)); 3) type III proteins such as human cytochrome P450 NF25(Hatakeyama, supra); and 4) type IV proteins such as humanP-glycoprotein (Hatakeyama, supra). Particularly preferred are CD8 andICAM-2. For example, the signal sequences from CD8 and ICAM-2 lie at theextreme 5′ end of the transcript. These consist of the amino acids 1-32in the case of CD8 (MASPLTRFLSLNLLLLGESILGSGEAKPQAP (SEQ ID NO:13);Nakauchi et al., PNAS USA 82:5126 (1985)) and 1-21 in the case of ICAM-2(MSSFGYRTLTVALFTLICCPG (SEQ ID NO:14); Staunton et al., Nature (London)339:61 (1989)). These leader sequences deliver the construct to themembrane while the hydrophobic transmembrane domains, placed 3′ of therandom candidate region, serve to anchor the construct in the membrane.These transmembrane domains are encompassed by amino acids 145-195 fromCD8 (PQRPEDCRPRGSVKGTGLDFACDIYIWAPLAGICVALLLSLIITLICYHSR (SEQ ID NO:15);Nakauchi, supra) and 224-256 from ICAM-2(MVIIVTVVSVLLSLFVTSVLLCFIFGQHLRQQR (SEQ ID NO:16); Staunton, supra).

Alternatively, membrane anchoring sequences include the GPI anchor,which results in a covalent bond between the molecule and the lipidbilayer via a glycosyl-phosphatidylinositol bond for example in DAF(PNKGSGTTSGTTRLLSGHTCFTLTGLLGTLVTMGLLT (SEQ ID NO: 17), with the boldedserine the site of the anchor; see Homans et al., Nature333(6170):269-72 (1988), and Moran et al., J. Biol. Chem. 266:1250(1991)). In order to do this, the GPI sequence from Thy-1 can becassetted 3′ of the variable region in place of a transmembranesequence.

Similarly, myristylation sequences can serve as membrane anchoringsequences. It is known that the myristylation of c-src recruits it tothe plasma membrane. This is a simple and effective method of membranelocalization, given that the first 14 amino acids of the protein aresolely responsible for this function: MGSSKSKPKDPSQR (SEQ ID NO:18) (seeCross et al., Mol. Cell. Biol. 4(9):1834 (1984); Spencer et al., Science262:1019-1024 (1993), both of which are hereby incorporated byreference). This motif has already been shown to be effective in thelocalization of selectable genes and can be used to anchor the zetachain of the TCR. This motif is placed 5′ of the variable region inorder to localize the construct to the plasma membrane. Othermodifications such as palmitoylation can be used to anchor constructs inthe plasma membrane; for example, palmitoylation sequences from the Gprotein-coupled receptor kinase GRK6 sequence(LLQRLFSRQDCCGNCSDSEEELPTRL (SEQ ID NO:19), with the bold cysteinesbeing palmitolyated; Stoffel et al., J. Biol. Chem 269:27791 (1994));from rhodopsin (KQFRNCMLTSLCCGKNPLGD (SEQ ID NO:20); Barnstable et al.,J. Mol. Neurosci. 5(3):207 (1994)); and the p21 H-ras 1 protein(LNPPDESGPGCMSCKCVLS (SEQ ID NO:21); Capon et al., Nature 302:33(1983)).

In a preferred embodiment, the targeting sequence is a lysozomaltargeting sequence, including, for example, a lysosomal degradationsequence such as Lamp-2 (KFERQ (SEQ ID NO:22); Dice, Ann. N.Y. Acad.Sci. 674:58 (1992)); or lysosomal membrane sequences from Lamp-1(MLIPIAGFFALAGLVLIVLIAYLIGRKRSHAGYQTI (SEQ ID NO :23), Uthayakumar etal., Cell. Mol. Biol. Res. 41:405 (1995)) or Lamp-2(LVPIAVGAALAGVLILVLLAYFIGLKHHHAGYEQF (SEQ ID NO:24), Konecki et la.,Biochem. Biophys. Res. Comm. 205:1-5 (1994), both of which show thetransmembrane domains in italics and the cytoplasmic targeting signalunderlined).

Alternatively, the targeting sequence may be a mitrochondriallocalization sequence, including mitochondrial matrix sequences (e.g.yeast alcohol dehydrogenase III; MLRTSSLFTRRVQPSLFSRNILRLQST (SEQ IDNO:25); Schatz, Eur. J. Biochem. 165:1-6 (1987)); mitochondrial innermembrane sequences (yeast cytochrome c oxidase subunit IV;MLSLRQSIRFFKPATRTLCSSRYLL (SEQ ID NO:26); Schatz, supra); mitochondrialintermembrane space sequences (yeast cytochrome c1;MFSMLSKRWAQRTLSKSFYSTATGAASKSGKLTQKLVTAGVAAAGITAST LLYADSLTAEAMTA (SEQID NO:27); Schatz, supra) or mitochondrial outer membrane sequences(yeast 70 kD outer membrane protein;MKSFITRNKTAILATVAATGTAIGAYYYYNQLQQQQQRGKK (SEQ ID NO:28); Schatz,supra).

The target sequences may also be endoplasmic reticulum sequences,including the sequences from calreticulin (KDEL (SEQ ID NO:29); Pelham,Royal Society London Transactions B; 1-10 (1992)) or adenovirus E3/19Kprotein (LYLSRRSFIDEKKMP (SEQ ID NO:30); Jackson et al., EMBO J. 9:3153(1990)).

Furthermore, targeting sequences also include peroxisome sequences (forexample, the peroxisome matrix sequence from Luciferase; SKL (SEQ IDNO:32); Keller et al., PNAS USA 4:3264 (1987)); farnesylation sequences(for example, P21 H-ras 1; LNPPDESGPGCMSCKCVLS (SEQ ID NO:31), with thebold cysteine farnesylated; Capon, supra); geranylgeranylation sequences(for example, protein rab-5A; LTEPTQPTRNQCCSN (SEQ ID NO:33), with thebold cysteines geranylgeranylated; Farnsworth, PNAS USA 91:11963(1994)); or destruction sequences (cyclin B1; RTALGDIGN (SEQ ID NO:34);Klotzbucher Ct al., EMBO J. 1:3053 (1996)).

In a preferred embodiment, the targeting sequence is a secretory signalsequence capable of effecting the secretion of the candidate agent.There are a large number of known secretory signal sequences which areplaced 5′ to the variable peptide region, and are cleaved from thepeptide region to effect secretion into the extracellular space.Secretory signal sequences and their transferability to unrelatedproteins are well known, e.g., Silhavy, et al. (1985) Microbiol. Rev.49, 398-418. This is particularly useful to generate a peptide capableof binding to the surface of, or affecting the physiology of, a targetcell that is other than the host cell, e.g., the cell infected with theretrovirus. In a preferred approach, a fusion product is configured tocontain, in series, secretion signal peptide-presentationstructure-randomized expression product region-presentation structure.In this manner, target cells grown in the vicinity of cells caused toexpress the library of peptides, are bathed in secreted peptide. Targetcells exhibiting a physiological change in response to the presence of apeptide, e.g., by the peptide binding to a surface receptor or by beinginternalized and binding to intracellular targets, and the secretingcells are localized by any of a variety of selection schemes and thepeptide causing the effect determined. Exemplary effects includevariously that of a designer cytokine (i.e., a stem cell factor capableof causing hematopoietic stem cells to divide and maintain theirtotipotential), a factor causing cancer cells to undergo spontaneousapoptosis, a factor that binds to the cell surface of target cells andlabels them specifically, etc.

Suitable secretory sequences are known, including signals from IL-2(MYRMQLLSCIALSLALVTNS (SEQ ID NO:35); Villinger et al., J. Immunol.155:3946 (1995)), growth hormone (MATGSRTSLLLAFGLLCLPWLQEGSAFPT (SEQ IDNO:36); Roskam et al., Nucleic Acids Res. 7:30 (1979)); preproinsulin(MALWMRLLPLLALLALWGPDPAAAFVN (SEQ ID NO:37); Bell et al., Nature 284:26(1980)); and influenza HA protein (MKAKLLVLLYAFVAGDQI (SEQ ID NO:38);Sekikawa et al., PNAS 80:3563)), with cleavage between thenon-underlined-underlined junction. A particularly preferred secretorysignal sequence is the signal leader sequence from the secreted cytokineIL-4, which comprises the first 24 amino acids of IL-4 as follows:MGLTSQLLPPLFFLLACAGNFVHG (SEQ ID NO:39).

In a preferred embodiment, the fusion partner is a rescue sequence. Arescue sequence is a sequence which may be used to purify or isolateeither the candidate agent or the heterologous nucleic acid encoding it.Thus, for example, peptide rescue sequences include purificationsequences such as the His₆ tag for use with Ni affinity columns andepitope tags for detection, immunoprecipitation or FACS(fluoroscence-activated cell sorting). Suitable epitope tags include myc(for use with the commercially available 9E10 antibody), the BSPbiotinylation target sequence of the bacterial enzyme BirA, flu tags,lacZ, and GST.

Alternatively, the rescue sequence may be a unique oligonucleotidesequence which serves as a probe target site to allow the quick and easyisolation of the retroviral construct, via PCR, related techniques, orhybridization.

In a preferred embodiment, the fusion partner is a stability sequence toconfer stability to the candidate bioactive agent or the heterologousnucleic acid encoding it. Thus, for example, peptides may be stabilizedby the incorporation of glycines after the initiation methionine (MG orMGG0), for protection of the peptide to ubiquitination as perVarshavsky's N-End Rule, thus conferring long half-life in thecytoplasm. Similarly, two prolines at the C-terminus impart peptidesthat are largely resistant to carboxypeptidase action. The presence oftwo glycines prior to the pro lines impart both flexibility and preventstructure initiating events in the di-proline to be propagated into thecandidate peptide structure. Thus, preferred stability sequences are asfollows: MG(X)_(n)GGPP, where X is any amino acid and n is an integer ofat least four (SEQ ID NO:40).

In one embodiment, the fusion partner is a dimerization sequence. Adimerization sequence allows the non-covalent association of one randompeptide to another random peptide, with sufficient affinity to remainassociated under normal physiological conditions. This effectivelyallows small libraries of random peptides (for example, 10⁴) to becomelarge libraries if two peptides per cell are generated which thendimerize, to form an effective library of 10⁸ (10⁴×10⁴). It also allowsthe formation of longer random peptides, if needed, or more structurallycomplex random peptide molecules. The dimers may be homo- orheterodimers.

Dimerization sequences may be a single sequence that self-aggregates, ortwo sequences, each of which is generated in a different retroviralconstruct. That is, nucleic acids encoding both a first random peptidewith dimerization sequence 1, and a second random peptide withdimerization sequence 2, such that upon introduction into a cell andexpression of the nucleic acid, dimerization sequence 1 associates withdimerization sequence 2 to form a new random peptide structure.

Suitable dimerization sequences will encompass a wide variety ofsequences. Any number of protein-protein interaction sites are known. Inaddition, dimerization sequences may also be elucidated using standardmethods such as the yeast two hybrid system, traditional biochemicalaffinity binding studies, or even using the present methods.

The fusion partners may be placed anywhere (i.e. N-terminal, C-terminal,internal) in the structure as the biology and activity permits.

In a preferred embodiment, the fusion partner includes a linker ortethering sequence, as generally described in PCT U.S. 97/01019, thatcan allow the candidate agents to interact with potential targetsunhindered. For example, when the candidate bioactive agent is apeptide, useful linkers include glycine-serine polymers (including, forexample, (GS)_(n) (SEQ ID NO:41), (GSGGS)_(n) (SEQ ID NO:42) and(GGGS)_(n) (SEQ ID NO:43), where n is an integer of at least one),glycine-alanine polymers, alanine-serine polymers, and other flexiblelinkers such as the tether for the shaker potassium channel, and a largevariety of other flexible linkers, as will be appreciated by those inthe art. Glycine-serine polymers are preferred since both of these aminoacids are relatively unstructured, and therefore may be able to serve asa neutral tether between components. Secondly, serine is hydrophilic andtherefore able to solubilize what could be a globular glycine chain.Third, similar chains have been shown to be effective in joiningsubunits of recombinant proteins such as single chain antibodies.

In addition, the fusion partners, including presentation structures, maybe modified, randomized, and/or matured to alter the presentationorientation of the randomized expression product. For example,determinants at the base of the loop may be modified to slightly modifythe internal loop peptide tertiary structure, which maintaining therandomized amino acid sequence.

Thus, heterologous nucleic acids can be sequences which have not beenmanipulated in any way, or alternatively, they can be constructed tohave fusion partners. In either case, they can be inserted into theshuttle vectors by conventional methods such as enzymatic manipulationand ligation, or preferably, are inserted into the shuttle vector byhomologous recombination as described herein.

It is understood by the skilled artisan that while various options (ofcompounds, properties selected or order of steps) are provided herein,the options are also each provided individually, and can each beindividually segregated from the other options provided herein.Moreover, steps which are obvious and known in the art are intended tobe within the scope of this invention. For example, there may beadditionally washing steps, segregation, or isolation steps. Moreover,additional components to vectors, particularly regulatory elements,cells, cell media, etc., which are routine and known in the art can beincorporated herein without deviating from the spirit and scope of theinvention.

The following examples serve to more fully describe the manner of usingthe above-described invention, as well as to set forth the best modescontemplated for carrying out various aspects of the invention. It isunderstood that these examples in no way serve to limit the true scopeof this invention, but rather are presented for illustrative purposes.All references cited herein are expressly incorporated by reference intheir entirety.

EXAMPLES Example 1 Mammalian Cells Transfected with a Shuttle VectorShow Expression

A shuttle vector (pPYC-R) was constructed in accordance with theschematic shown in FIG. 1. The sequence is provided in FIG. 2. Thevector has an IRES at positions 6001-6505, a GFP at 6506-7258, Amp^(R)at 9888-655, an E. coli replication origin at 656-1456, a yeast 2μreplication origin at 1461-2808, Trp at 3344-4018 and a CMV promoter at4853-5614 of SEQ ID NO:1 shown in FIG. 2.

1 μg of pPYC-R plasmid was transfected into 30% confluent 293 (Phoenix)cells by a standard Ca²⁺ Phosphate transfection method known in the artto test expression of GFP. After incubation for 48 hours in 37° C. CO₂incubator, cells transfected by pPYC-R show green fluorescence colorunder UV microscope as depicted in FIG. 3.

Example 2 Use of Yeast to Construct Shuttle Vector with Insert,Expression of Insert

This example demonstrates the in-frame fusion of Rip cDNA, anapoptosis-inducing gene when over-expressed in mammalian cells, to ahemagglutinin (HA) tag in pPYC by recombination with non-virus basedvector. Rip is further described in Hsu, et al., Immunity, 4:387-396(1996), incorporated herein by reference.

1 μg of pPYC plasmid (FIGS. 4 and 5) was cut by EcoR I to linearize andwas purified from agarose gel. Rip cDNA was amplified by PCR and waspurified from agarose gel. The oligo-nucleotide sequences used toamplify Rip were

ACGACTCACTATAGGCTAGCCGCCACCATGGCTTACCCATACGATGTTC (SEQ ID NO:3) andCAGATTACGCTGGGCAACCAGACATGTCCTTGAATTGCCAAAAGACGGCAATATGGTGGAAAATAACGTGTCGTACTCTAGAG (SEQ ID NO:4).GTACCACGCGTGTTAGTTCTGGCTGACGTAAA

Flanking sequences required for homologous recombination between PCRfragment and vector are underlined. The purified vector and PCR fragmentwas co-transfected into yeast by a standard Li/PEG method known in theart. Transformants were plated on SD-W selection plate and wereincubated in 30° C. incubator for 4 days. Colonies were harvested andpooled together for plasmid mini-preparation to recover recombinantplasmid from yeast. The plasmid from mini-preparation was transformedinto E. coli to isolate single colony on LB plus 50 μg/ml ampicilin.Five colonies were picked to grow up for plasmid mini-preparation andsubsequent restriction enzyme digestion and sequencing verification.

Clones with Rip cDNA inserted in-frame downstream of the tag (HA) wereco-transfected with pGDB, an apoptosis reporter vector, into 30%confluent mammalian 293 (Phoenix) cells by Ca²⁺ phosphate transfectionmethod to test expression of Rip. FIG. 6 shows the results, extensivecell death due to the expression of HA tagged Rip.

Example 3 Mammalian Cells Transfected with a Shuttle Vector ShowExpression

A shuttle vector (pCRU5YMS) was constructed in accordance with theschematic shown in FIG. 7. The sequence is provided in FIG. 8.

One microgram of pCRU5YMS was used to transfect virus packaging cellline Phoenix (293) by a standard CaPO₄ transfection method. Under a UVmicroscope, green fluorescence can be observed after 24 hours,indicating that the GFP has been expressed by pCRU5 promoter. After 48hours incubation at 30° C., medium containing newly packaged viruses washarvested for a second round of infection of Hela cells. 48 hours afterinfection, green fluorescence can be observed in most of the Hela cellsunder the UV microscope as shown in FIG. 9.

                   #             SEQUENCE LISTING<160> NUMBER OF SEQ ID NOS: 43 <210> SEQ ID NO 1 <211> LENGTH: 10100<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE:<223> OTHER INFORMATION: Description of Artificial  #Sequence:      constructed vectors <400> SEQUENCE: 1caatgatgag cacttttaaa gttctgctat gtggcgcggt attatcccgt gt#tgacgccg     60ggcaagagca actcggtcgc cgcatacact attctcagaa tgacttggtt ga#gtactcac    120cagtcacaga aaagcatctt acggatggca tgacagtaag agaattatgc ag#tgctgcca    180taaccatgag tgataacact gcggccaact tacttctgac aacgatcgga gg#accgaagg    240agctaaccgc ttttttgcac aacatggggg atcatgtaac tcgccttgat cg#ttgggaac    300cggagctgaa tgaagccata ccaaacgacg agcgtgacac cacgatgcct gt#agcaatgg    360caacaacgtt gcgcaaacta ttaactggcg aactacttac tctagcttcc cg#gcaacaat    420taatagactg gatggaggcg gataaagttg caggaccact tctgcgctcg gc#ccttccgg    480ctggctggtt tattgctgat aaatctggag ccggtgagcg tgggtctcgc gg#tatcattg    540cagcactggg gccagatggt aagccctccc gtatcgtagt tatctacacg ac#ggggagtc    600aggcaactat ggatgaacga aatagacaga tcgctgagat aggtgcctca ct#gattaagc    660attggtaact gtcagaccaa gtttactcat atatacttta gattgatttg cg#gccgcaaa    720cttcattttt aatttaaaag gatctaggtg aagatccttt ttgataatct ca#tgaccaaa    780atcccttaac gtgagttttc gttccactga gcgtcagacc ccgtagaaaa ga#tcaaagga    840tcttcttgag atcctttttt tctgcgcgta atctgctgct tgcaaacaaa aa#aaccaccg    900ctaccagcgg tggtttgttt gccggatcaa gagctaccaa ctctttttcc ga#aggtaact    960ggcttcagca gagcgcagat accaaatact gtccttctag tgtagccgta gt#taggccac   1020cacttcaaga actctgtagc accgcctaca tacctcgctc tgctaatcct gt#taccagtg   1080gctgctgcca gtggcgataa gtcgtgtctt accgggttgg actcaagacg at#agttaccg   1140gataaggcgc agcggtcggg ctgaacgggg ggttcgtgca cacagcccag ct#tggagcga   1200acgacctaca ccgaactgag atacctacag cgtgagctat gagaaagcgc ca#cgcttccc   1260gaagggagaa aggcggacag gtatccggta agcggcaggg tcggaacagg ag#agcgcacg   1320agggagcttc cagggggaaa cgcctggtat ctttatagtc ctgtcgggtt tc#gccacctc   1380tgacttgagc gtcgattttt gtgatgctcg tcaggggggc ggagcctatg ga#aaaacgcc   1440agcaacgcgg ccttttgctc gaacgaagca tctgtgcttc attttgtaga ac#aaaaatgc   1500aacgcgagag cgctaatttt tcaaacaaag aatctgagct gcatttttac ag#aacagaaa   1560tgcaacgcga aagcgctatt ttaccaacga agaatctgtg cttcattttt gt#aaaacaaa   1620aatgcaacgc gagagcgcta atttttcaaa caaagaatct gagctgcatt tt#tacagaac   1680agaaatgcaa cgcgagagcg ctattttacc aacaaagaat ctatacttct tt#tttgttct   1740acaaaaatgc atcccgagag cgctattttt ctaacaaagc atcttagatt ac#tttttttc   1800tcctttgtgc gctctataat gcagtctctt gataactttt tgcactgtag gt#ccgttaag   1860gttagaagaa ggctactttg gtgtctattt tctcttccat aaaaaaagcc tg#actccact   1920tcccgcgttt actgattact agcgaagctg cgggtgcatt ttttcaagat aa#aggcatcc   1980ccgattatat tctataccga tgtggattgc gcatactttg tgaacagaaa gt#gatagcgt   2040tgatgattct tcattggtca gaaaattatg aacggtttct tctattttgt ct#ctatatac   2100tacgtatagg aaatgtttac attttcgtat tgttttcgat tcactctatg aa#tagttctt   2160actacaattt ttttgtctaa agagtaatac tagagataaa cataaaaaat gt#agaggtcg   2220agtttagatg caagttcaag gagcgaaagg tggatgggta ggttatatag gg#atatagca   2280cagagatata tagcaaagag atacttttga gcaatgtttg tggaagcggt at#tcgcaata   2340ttttagtagc tcgttacagt ccggtgcgtt tttggttttt tgaaagtgcg tc#ttcagagc   2400gcttttggtt ttcaaaagcg ctctgaagtt cctatacttt ctagagaata gg#aacttcgg   2460aataggaact tcaaagcgtt tccgaaaacg agcgcttccg aaaatgcaac gc#gagctgcg   2520cacatacagc tcactgttca cgtcgcacct atatctgcgt gttgcctgta ta#tatatata   2580catgagaaga acggcatagt gcgtgtttat gcttaaatgc gtacttatat gc#gtctattt   2640atgtaggatg aaaggtagtc tagtacctcc tgtgatatta tcccattcca tg#cggggtat   2700cgtatgcttc cttcagcact accctttagc tgttctatat gctgccactc ct#caattgga   2760ttagtctcat ccttcaatgc tatcatttcc tttgatattg gatcatatta ag#aaaccatt   2820attatcatga cattaaccta taaaaatagg cgtatcacga ggccctttcg tc#tcgcgcgt   2880ttcggtgatg acggtgaaaa cctctgacac atgcagctcc cggagacggt ca#cagcttgt   2940ctgtaagcgg atgccgggag cagacaagcc cgtcagggcg cgtcagcggg tg#ttggcggg   3000tgtcggggct ggcttaacta tgcggcatca gagcagattg tactgagagt gc#accataga   3060tcaacgacat tactatatat ataatatagg aagcatttaa tagacagcat cg#taatatat   3120gtgtactttg cagttatgac gccagatggc agtagtggaa gatattcttt at#tgaaaaat   3180agcttgtcac cttacgtaca atcttgatcc ggagcttttc tttttttgcc ga#ttaagaat   3240taattcggtc gaaaaaagaa aaggagaggg ccaagaggga gggcattggt ga#ctattgag   3300cacgtgagta tacgtgatta agcacacaaa ggcagcttgg agtatgtctg tt#attaattt   3360cacaggtagt tctggtccat tggtgaaagt ttgcggcttg cagagcacag ag#gccgcaga   3420atgtgctcta gattccgatg ctgacttgct gggtattata tgtgtgccca at#agaaagag   3480aacaattgac ccggttattg caaggaaaat ttcaagtctt gtaaaagcat at#aaaaatag   3540ttcaggcact ccgaaatact tggttggcgt gtttcgtaat caacctaagg ag#gatgtttt   3600ggctctggtc aatgattacg gcattgatat cgtccaactg catggagatg ag#tcgtggca   3660agaataccaa gagttcctcg gtttgccagt tattaaaaga ctcgtatttc ca#aaagactg   3720caacatacta ctcagtgcag cttcacagaa acctcattcg tttattccct tg#tttgattc   3780agaagcaggt gggacaggtg aacttttgga ttggaactcg atttctgact gg#gttggaag   3840gcaagagagc cccgaaagct tacattttat gttagctggt ggactgacgc ca#gaaaatgt   3900tggtgatgcg cttagattaa atggcgttat tggtgttgat gtaagcggag gt#gtggagac   3960aaatggtgta aaagactcta acaaaatagc aaatttcgtc aaaaatgcta ag#aaataggt   4020tattactgag tagtatttat ttaagtattg tttgtgcact tgccgatcac ta#tggccatt   4080taatgtaaat acttaagaaa aaaaaccaaa ttaattttga tacatgctgc at#gtgaagac   4140ccccgctgac gggtagtcaa tcactcagag gagaccctcc caaggcagcg ag#accacaag   4200tcggaaatga aagacccccg ctgacgggta gtcaatcact cagaggagac cc#tcccaagg   4260aacagcgaga ccacaagtcg gatgcaactg caagagggtt tattggatac ac#gggtaccc   4320gggcgactca gtcaatcgga ggactggcgc cccgagtgag gggttgtggg ct#cttttatt   4380gagctcgggg agcagaagcg cgcgaacaga agcgagaagc gaactgattg gt#tagttcaa   4440ataaggcaca gggtcatttc aggtccttgg ggcaccctgg aaacatctga tg#gttctcta   4500gaaactgctg agggctggac cgcatctggg gaccatctgt tcttggccct ga#gccggggc   4560aggaactgct taccacagat atcctgtttg gcccatattc agctgttcca tc#tgttcttg   4620gccctgagcc ggggcaggaa ctgcttacca cagatatcct gtttggccca ta#ttcagctg   4680ttccatctgt tcctgacctt gatctgaact tctctattct cagttatgta tt#tttccatg   4740ccttgcaaaa tggcgttact taagctagct tgccaaacct acaggtgggg tc#tttcattc   4800cccccttttt ctggagacta aataaaatct tttattttat cgtcgatcga ct#agatcttc   4860aatattggcc attagccata ttattcattg gttatatagc ataaatcaat at#tggctatt   4920ggccattgca tacgttgtat ctatatcata atatgtacat ttatattggc tc#atgtccaa   4980tatgaccgcc atgttggcat tgattattga ctagttatta atagtaatca at#tacggggt   5040cattagttca tagcccatat atggagttcc gcgttacata acttacggta aa#tggcccgc   5100ctggctgacc gcccaacgac ccccgcccat tgacgtcaat aatgacgtat gt#tcccatag   5160taacgccaat agggactttc cattgacgtc aatgggtgga gtatttacgg ta#aactgccc   5220acttggcagt acatcaagtg tatcatatgc caagtccgcc ccctattgac gt#caatgacg   5280gtaaatggcc cgcctggcat tatgcccagt acatgacctt acgggacttt cc#tacttggc   5340agtacatcta cgtattagtc atcgctatta ccatggtgat gcggttttgg ca#gtacacca   5400atgggcgtgg atagcggttt gactcacggg gatttccaag tctccacccc at#tgacgtca   5460atgggagttt gttttggcac caaaatcaac gggactttcc aaaatgtcgt aa#caactgcg   5520atcgcccgcc ccgttgacgc aaatgggcgg taggcgtgta cggtgggagg tc#tatataag   5580cagagctcgt ttagtgaacc gtcagatcac tagaagcttt attgcggtag tt#tatcacag   5640ttaaattgct aacgcagtca gtgcttctga cacaacagtc tcgaacttaa gc#tgcagtga   5700ctctcttaag gtagccttgc agaagttggt cgtgaggcac tgggcaggta ag#tatcaagg   5760ttacaagaca ggtttaagga gaccaataga aactgggctt gtcgagacag ag#aagactct   5820tgcgtttctg ataggcacct attggtctta ctgacatcca ctttgccttt ct#ctccacag   5880gtgtccactc ccagttcaat tacagctctt aaggctagag tacttaatac ga#ctcactat   5940aggctagcct cgagccgcca ccatggaatt cacgtgcatg caggccttaa tt#aagtcgac   6000acgttatttt ccaccatatt gccgtctttt ggcaatgtga gggcccggaa ac#ctggccct   6060gtcttcttga cgagcattcc taggggtctt tcccctctcg ccaaaggaat gc#aaggtctg   6120ttgaatgtcg tgaaggaagc agttcctctg gaagcttctt gaagacaaac aa#cgtctgta   6180gcgacccttt gcaggcagcg gaacccccca cctggcgaca ggtgcctctg cg#gccaaaag   6240ccacgtgtat aagatacacc tgcaaaggcg gcacaacccc agtgccacgt tg#tgagttgg   6300atagttgtgg aaagagtcaa atggctctcc tcaagcgtat tcaacaaggg gc#tgaaggat   6360gcccagaagg taccccattg tatgggatct gatctggggc ctcggtgcac at#gctttaca   6420tgtgtttagt cgaggttaaa aaacgtctag gccccccgaa ccacggggac gt#ggttttcc   6480tttgaaaaac acgatgataa tatgggggat ccaccggtcg ccaccatggt ga#gcaagggc   6540gaggagctgt tcaccggggt ggtgcccatc ctggtcgagc tggacggcga cg#taaacggc   6600cacaagttca gcgtgtccgg cgagggcgag ggcgatgcca cctacggcaa gc#tgaccctg   6660aagttcatct gcaccaccgg caagctgccc gtgccctggc ccaccctcgt ga#ccaccctg   6720acctacggcg tgcagtgctt cagccgctac cccgaccaca tgaagcagca cg#acttcttc   6780aagtccgcca tgcccgaagg ctacgtccag gagcgcacca tcttcttcaa gg#acgacggc   6840aactacaaga cccgcgccga ggtgaagttc gagggcgaca ccctggtgaa cc#gcatcgag   6900ctgaagggca tcgacttcaa ggaggacggc aacatcctgg ggcacaagct gg#agtacaac   6960tacaacagcc acaacgtcta tatcatggcc gacaagcaga agaacggcat ca#aggtgaac   7020ttcaagatcc gccacaacat cgaggacggc agcgtgcagc tcgccgacca ct#accagcag   7080aacaccccca tcggcgacgg ccccgtgctg ctgcccgaca accactacct ga#gcacccag   7140tccgccctga gcaaagaccc caacgagaag cgcgatcaca tggtcctgct gg#agttcgtg   7200accgccgccg ggatcactct cggcatggac gagctgtaca agtaaagcgg cc#gctcgacg   7260ataaattccc tttagtgagg gttaatgctt cgagcagaca tgataagata ca#ttgatgag   7320tttggacaaa ccacaactag aatgcagtga aaaaaatgct ctatttgtga aa#tttgtgat   7380gctattgctg tatttgtaac cattataagc tgcaataaac aagttaacaa ca#acaattgc   7440attcatttta tgtttcaggt tcagggggag atgtgggagg ttttttaaag ca#agtaaaac   7500ctctacaaat gtggtaaaat ccgataagga tccggcagtc tagaggatgg tc#cacccccg   7560gggtcggcag ccttcacgtg ggcggcgtgt atccaagctg cgatgccgtc ta#ctttgagg   7620gcggtggggg tggtcagcag gactgtgtaa ggtcctttcc agcgaggttc ta#ggttctta   7680gtctggtgtc ggcggaccca cactgtgtcg ccgactcggt aagggtgagg ta#ccaccggt   7740cggtccagtt gttcttggta cgtgccgcca gaggtctcca gacttcgtgc tg#gactaagt   7800agagagcctg taagtgagct tggagagagg ggctgttagt aactcttgtc at#gtcagggt   7860cagggaagtt tacaaggggc gggggtgccc catataagat ctcatatggc ca#tatggggg   7920cgcctagaga aggagtgagg gctggataaa gggaggatcg aggcggggtc ga#acgaggag   7980gttcaagggg gagagacggg gcggatggag gaagaggagg cggaggctta gg#gtgtacaa   8040agggcttgac ccagggaggg gggtcaaaag ccaaggcttc ccaggtcacg at#gtagggga   8100cctggtctgg gtgtccatgc gggccaggtg aaaagacctt gatcttaacc tg#ggtgatga   8160ggtctcggtt aaaggtgccg tctcgcggcc atccgacgtt aaaggttggc ca#ttctgcag   8220agcagaaggt aacccaacgt ctcttcttga catctaccga ctggttgtga gc#gatccgct   8280cgacatcttt ccagtgacct aaggtcaaac ttaagggagt ggtaacagtc tg#gcccgggc   8340ccatattttc agacaaatac agaaacacag tcagacagag acaacacaga ac#gatgctgc   8400agcagacaag acgcgcggcg cggcttcggt cccaaaccga aagcaaaaat tc#agacggag   8460gcgggaactg ttttaggttc tcgtctccta ccagaaccac atatccctcc tc#taaggggg   8520gtgcaccaaa gagtccaaaa cgatcgggat ttttggactc aggtcgggcc ac#aaaaacgg   8580cccccgaagt ccctgggacg tctcccaggg ttgcggccgg gtgttccgaa ct#cgtcagtt   8640ccaccacggg tccgccagat acagagctag ttagctaact agtaccgacg ca#ggcgcata   8700aaatcagtca tagacactag acaatcggac agacacagat aagttgctgg cc#agcttacc   8760tcccggtggt gggtcggtgg tccctgggca ggggtctccc gatcccggac ga#gcccccaa   8820atgaaagacc cccgctgacg ggtagtcaat cactcagagg agaccctccc aa#ggaacagc   8880gagaccacaa gtcggatgca actgcaagag ggtttattgg atacacgggt ac#ccgggcga   8940ctcagtcaat cggaggactg gcgccccgag tgaggggttg tgggctcttt ta#ttgagctc   9000ggggagcaga agcgcgcgaa cagaagcgag aagcgaactg attggttagt tc#aaataagg   9060cacagggtca tttcaggtcc ttggggcacc ctggaaacat ctgatggttc tc#tagaaact   9120gctgagggct ggaccgcatc tggggaccat ctgttcttgg ccctgagccg gg#gcaggaac   9180tgcttaccac agatatcctg tttggcccat attcagctgt tccatctgtt ct#tggccctg   9240agccggggca ggaactgctt accacagata tccgctttgg cccatattca gc#tgttccat   9300ctgttcctga ccttgatctg aacttttcta ttctcagtta tgtatttttc ca#tgccttgc   9360aaaatggcgt tacttaagct agcttgccaa acctacaggt ggggtctttc ac#atgtatat   9420gtcaaaaata aaaatcaact aattgactag taattaatat gactggcata at#gggaaatt   9480gatcctgaca gatgcaaact ggcttctcag cagcgcattt atgttgtcaa ct#gaggaagg   9540aaacgttaat gacagaaact ctaagtaatt tccacgttta tctattttta tt#tatactag   9600ctttggtaac aggaatattg cagcattcat gcacattgaa acccttatga aa#taaaaaca   9660tctgtgcatt taaaatggaa ttaacatttt aaatgttaaa aaaagctggc tt#agcttccc   9720cccgccccct agggcataga acaagtcaaa tgctttatat atttgagttt gg#gatgtatt   9780aggaaactcc taagagcaaa gctgttcttg aagacgaaag ggcctcgtga ta#cgcctatt   9840tttataggtt aatgtcatga gacaataacc ctgataaatg cttcaataat at#tgaaaaag   9900gaagagtatg agtattcaac atttccgtgt cgcccttatt cccttttttg cg#gcattttg   9960ccttcctgtt tttgctcacc cagaaacgct ggtgaaagta aaagatgctg aa#gatcagtt  10020gggtgcacga gtgggttaca tcgaactgga tctcaacagc ggtaagatcc tt#gagagttt  10080 tcgccccgaa gaacgttttc             #                  #                10100 <210> SEQ ID NO 2 <211> LENGTH: 9687<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE:<223> OTHER INFORMATION: Description of Artificial  #Sequence:      constructed vectors <400> SEQUENCE: 2tcaatattgg ccattagcca tattattcat tggttatata gcataaatca at#attggcta     60ttggccattg catacgttgt atctatatca taatatgtac atttatattg gc#tcatgtcc    120aatatgaccg ccatgttggc attgattatt gactagttat taatagtaat ca#attacggg    180gtcattagtt catagcccat atatggagtt ccgcgttaca taacttacgg ta#aatggccc    240gcctggctga ccgcccaacg acccccgccc attgacgtca ataatgacgt at#gttcccat    300agtaacgcca atagggactt tccattgacg tcaatgggtg gagtatttac gg#taaactgc    360ccacttggca gtacatcaag tgtatcatat gccaagtccg ccccctattg ac#gtcaatga    420cggtaaatgg cccgcctggc attatgccca gtacatgacc ttacgggact tt#cctacttg    480gcagtacatc tacgtattag tcatcgctat taccatggtg atgcggtttt gg#cagtacac    540caatgggcgt ggatagcggt ttgactcacg gggatttcca agtctccacc cc#attgacgt    600caatgggagt ttgttttggc accaaaatca acgggacttt ccaaaatgtc gt#aacaactg    660cgatcgcccg ccccgttgac gcaaatgggc ggtaggcgtg tacggtggga gg#tctatata    720agcagagctc gtttagtgaa ccgtcagatc actagaagct ttattgcggt ag#tttatcac    780agttaaattg ctaacgcagt cagtgcttct gacacaacag tctcgaactt aa#gctgcagt    840gactctctta aggtagcctt gcagaagttg gtcgtgaggc actgggcagg ta#agtatcaa    900ggttacaaga caggtttaag gagaccaata gaaactgggc ttgtcgagac ag#agaagact    960cttgcgtttc tgataggcac ctattggtct tactgacatc cactttgcct tt#ctctccac   1020aggtgtccac tcccagttca attacagctc ttaaggctag agtacttaat ac#gactcact   1080ataggctagc cgccaccatg gcttacccat acgatgttcc agattacgct gg#gcaaccag   1140acatgtcctt gaatgtcatt aagatgaaat ccagtgactt cctggagagt gc#agaactgg   1200acagcggagg ctttgggaag gtgtctctgt gtttccacag aacccaggga ct#catgatca   1260tgaaaacagt gtacaagggg cccaactgca ttgagcacaa cgaggccctc tt#ggaggagg   1320cgaagatgat gaacagactg agacacagcc gggtggtgaa gctcctgggc gt#catcatag   1380aggaagggaa gtactccctg gtgatggagt acatggagaa gggcaacctg at#gcacgtgc   1440tgaaagccga gatgagtact ccgctttctg taaaaggaag gataatttgg ga#aatcattg   1500aaggaatgtg ctacttacat gaaaaggcgt gatacacaag gacctgaagc ct#gaaaatat   1560ccttgttgat aatgacttcc acattaagat cgcagacctc ggccttgcct cc#tttaagat   1620gtggagcaaa ctgaataatg aagagcacaa tgagctgagg gaagtggacg gc#accgctaa   1680gaagaatggc ggcaccctct actacatggc gcccgagcac ctgaatgacg tc#aacgcaaa   1740gcccacagag aagtcggatg tgtacagctt tgctgtagta ctctgggcga ta#tttgcaaa   1800taaggagcca tatgaaaatg ctatctgtga gcagcagttg ataatgtgca ta#aaatctgg   1860gaacaggcca gatgtggatg acatcactga gtactgccca agagaaatta tc#agtctcat   1920gaagctctgc tgggaagcga atccggaagc tcggccgaca tttcctggca tt#gaagaaaa   1980atttaggcct ttttatttaa gtcaattaga agaaagtgta gaagaggacg tg#aagagttt   2040aaagaaagag tattcaaacg aaaatgcagt tgtgaagaga atgcagtctc tt#caacttga   2100ttgtgtggca gtaccttcaa gccggtcaaa ttcagccaca gaacagcctg gt#tcactgca   2160cagttcccag ggacttggga tgggtcctgt ggaggagtcc tggtttgctc ct#tccctgga   2220gcacccacaa gaagagaatg agcccagcct gcagagtaaa ctccaagacg aa#gccaacta   2280ccatctttat ggcagccgca tggacaggca gacgaaacag cagcccagac ag#aatgtggc   2340ttacaacaga gaggaggaaa ggagacgcag ggtctcccat gacccttttg ca#cagcaaag   2400accttacgag aattttcaga atacagaggg aaaaggcact gtttattcca gt#gcagccag   2460tcatggtaat gcagtgcacc agccctcagg gctcaccagc caacctcaag ta#ctgtatca   2520gaacaatgga ttatatagct cacatggctt tggaacaaga ccactggatc ca#ggaacagc   2580aggtcccaga gtttggtaca ggccaattcc aagtcatatg cctagtctgc at#aatatccc   2640agtgcctgag accaactatc taggaaatac acccaccatg ccattcagct cc#ttgccacc   2700aacagatgaa tctataaaat ataccatata caatagtact ggcattcaga tt#ggagccta   2760caattatatg gagattggtg ggacgagttc atcactacta gacagcacaa at#acgaactt   2820caaagaagag ccagctgcta agtaccaagc tatctttgat aataccacta gt#ctgacgga   2880taaacacctg gacccaatca gggaaaatct gggaaagcac tggaaaaact gt#gcccgtaa   2940actgggcttc acacagtctc agattgatga aattgaccat gactatgagc ga#gatggact   3000gaaagaaaag gtttaccaga tgctccaaaa gtgggtgatg agggaaggca ta#aagggagc   3060cacggtgggg aagctggccc aggcgctcca ccagtgttcc aggatcgacc tt#ctgagcag   3120cttgatttac gtcagccaga actaacacgc gtggtacctc tagagtcgac ac#gttatttt   3180ccaccatatt gccgtctttt ggcaatgtga gggcccggaa acctggccct gt#cttcttga   3240cgagcattcc taggggtctt tcccctctcg ccaaaggaat gcaaggtctg tt#gaatgtcg   3300tgaaggaagc agttcctctg gaagcttctt gaagacaaac aacgtctgta gc#gacccttt   3360gcaggcagcg gaacccccca cctggcgaca ggtgcctctg cggccaaaag cc#acgtgtat   3420aagatacacc tgcaaaggcg gcacaacccc agtgccacgt tgtgagttgg at#agttgtgg   3480aaagagtcaa atggctctcc tcaagcgtat tcaacaaggg gctgaaggat gc#ccagaagg   3540taccccattg tatgggatct gatctggggc ctcggtgcac atgctttaca tg#tgtttagt   3600cgaggttaaa aaacgtctag gccccccgaa ccacggggac gtggttttcc tt#tgaaaaac   3660acgatgataa tatgggggat ccaccggtcg ccaccatggt gagcaagggc ga#ggagctgt   3720tcaccggggt ggtgcccatc ctggtcgagc tggacggcga cgtaaacggc ca#caagttca   3780gcgtgtccgg cgagggcgag ggcgatgcca cctacggcaa gctgaccctg aa#gttcatct   3840gcaccaccgg caagctgccc gtgccctggc ccaccctcgt gaccaccctg ac#ctacggcg   3900tgcagtgctt cagccgctac cccgaccaca tgaagcagca cgacttcttc aa#gtccgcca   3960tgcccgaagg ctacgtccag gagcgcacca tcttcttcaa ggacgacggc aa#ctacaaga   4020cccgcgccga ggtgaagttc gagggcgaca ccctggtgaa ccgcatcgag ct#gaagggca   4080tcgacttcaa ggaggacggc aacatcctgg ggcacaagct ggagtacaac ta#caacagcc   4140acaacgtcta tatcatggcc gacaagcaga agaacggcat caaggtgaac tt#caagatcc   4200gccacaacat cgaggacggc agcgtgcagc tcgccgacca ctaccagcag aa#caccccca   4260tcggcgacgg ccccgtgctg ctgcccgaca accactacct gagcacccag tc#cgccctga   4320gcaaagaccc caacgagaag cgcgatcaca tggtcctgct ggagttcgtg ac#cgccgccg   4380ggatcactct cggcatggac gagctgtaca agtaaagcgg ccgcttccct tt#agtgaggg   4440ttaatgcttc gagcagacat gataagatac attgatgagt ttggacaaac ca#caactaga   4500atgcagtgaa aaaaatgctt tatttgtgaa atttgtgatg ctattgcttt at#ttgtaacc   4560attataagct gcaataaaca agttaacaac aacaattgca ttcattttat gt#ttcaggtt   4620cagggggaga tgtgggaggt tttttaaagc aagtaaaacc tctacaaatg tg#gtaaaatc   4680cgataaggat cgatccgggc tggcgtaata gcgaagaggc ccgcaccgat cg#cccttccc   4740aacagttgcg cagcctgaat ggcgaatgga cgcgccctgt agcggcgcat ta#agcgcggc   4800gggtgtggtg gttacgcgca gcgtgaccgc tacacttgcc agcgccctag cg#cccgctcc   4860tttcgctttc ttcccttcct ttctcgccac gttcgccggc tttccccgtc aa#gctctaaa   4920tcgggggctc cctttagggt tccgatttag agctttacgg cacctcgacc gc#aaaaaact   4980tgatttgggt gatgctcgaa cgaagcatct gtgcttcatt ttgtagaaca aa#aatgcaac   5040gcgagagcgc taatttttca aacaaagaat ctgagctgca tttttacaga ac#agaaatgc   5100aacgcgaaag cgctatttta ccaacgaaga atctgtgctt catttttgta aa#acaaaaat   5160gcaacgcgag agcgctaatt tttcaaacaa agaatctgag ctgcattttt ac#agaacaga   5220aatgcaacgc gagagcgcta ttttaccaac aaagaatcta tacttctttt tt#gttctaca   5280aaaatgcatc ccgagagcgc tatttttcta acaaagcatc ttagattact tt#ttttctcc   5340tttgtgcgct ctataatgca gtctcttgat aactttttgc actgtaggtc cg#ttaaggtt   5400agaagaaggc tactttggtg tctattttct cttccataaa aaaagcctga ct#ccacttcc   5460cgcgtttact gattactagc gaagctgcgg gtgcattttt tcaagataaa gg#catccccg   5520attatattct ataccgatgt ggattgcgca tactttgtga acagaaagtg at#agcgttga   5580tgattcttca ttggtcagaa aattatgaac ggtttcttct attttgtctc ta#tatactac   5640gtataggaaa tgtttacatt ttcgtattgt tttcgattca ctctatgaat ag#ttcttact   5700acaatttttt tgtctaaaga gtaatactag agataaacat aaaaaatgta ga#ggtcgagt   5760ttagatgcaa gttcaaggag cgaaaggtgg atgggtaggt tatataggga ta#tagcacag   5820agatatatag caaagagata cttttgagca atgtttgtgg aagcggtatt cg#caatattt   5880tagtagctcg ttacagtccg gtgcgttttt ggttttttga aagtgcgtct tc#agagcgct   5940tttggttttc aaaagcgctc tgaagttcct atactttcta gagaatagga ac#ttcggaat   6000aggaacttca aagcgtttcc gaaaacgagc gcttccgaaa atgcaacgcg ag#ctgcgcac   6060atacagctca ctgttcacgt cgcacctata tctgcgtgtt gcctgtatat at#atatacat   6120gagaagaacg gcatagtgcg tgtttatgct taaatgcgta cttatatgcg tc#tatttatg   6180taggatgaaa ggtagtctag tacctcctgt gatattatcc cattccatgc gg#ggtatcgt   6240atgcttcctt cagcactacc ctttagctgt tctatatgct gccactcctc aa#ttggatta   6300gtctcatcct tcaatgctat catttccttt gatattggat catattaaga aa#ccattatt   6360atcatgacat taacctataa aaataggcgt atcacgaggc cctttcgtct cg#cgcgtttc   6420ggtgatgacg gtgaaaacct ctgacacatg cagctcccgg agacggtcac ag#cttgtctg   6480taagcggatg ccgggagcag acaagcccgt cagggcgcgt cagcgggtgt tg#gcgggtgt   6540cggggctggc ttaactatgc ggcatcagag cagattgtac tgagagtgca cc#atagatca   6600acgacattac tatatatata atataggaag catttaatag acagcatcgt aa#tatatgtg   6660tactttgcag ttatgacgcc agatggcagt agtggaagat attctttatt ga#aaaatagc   6720ttgtcacctt acgtacaatc ttgatccgga gcttttcttt ttttgccgat ta#agaattaa   6780ttcggtcgaa aaaagaaaag gagagggcca agagggaggg cattggtgac ta#ttgagcac   6840gtgagtatac gtgattaagc acacaaaggc agcttggagt atgtctgtta tt#aatttcac   6900aggtagttct ggtccattgg tgaaagtttg cggcttgcag agcacagagg cc#gcagaatg   6960tgctctagat tccgatgctg acttgctggg tattatatgt gtgcccaata ga#aagagaac   7020aattgacccg gttattgcaa ggaaaatttc aagtcttgta aaagcatata aa#aatagttc   7080aggcactccg aaatacttgg ttggcgtgtt tcgtaatcaa cctaaggagg at#gttttggc   7140tctggtcaat gattacggca ttgatatcgt ccaactgcat ggagatgagt cg#tggcaaga   7200ataccaagag ttcctcggtt tgccagttat taaaagactc gtatttccaa aa#gactgcaa   7260catactactc agtgcagctt cacagaaacc tcattcgttt attcccttgt tt#gattcaga   7320agcaggtggg acaggtgaac ttttggattg gaactcgatt tctgactggg tt#ggaaggca   7380agagagcccc gaaagcttac attttatgtt agctggtgga ctgacgccag aa#aatgttgg   7440tgatgcgctt agattaaatg gcgttattgg tgttgatgta agcggaggtg tg#gagacaaa   7500tggtgtaaaa gactctaaca aaatagcaaa tttcgtcaaa aatgctaaga aa#taggttat   7560tactgagtag tatttattta agtattgttt gtgcacttgc cgatcgcgta tg#gtgcactc   7620tcagtacaat ctgctctgat gccgcatagt taagccagcc ccgacacccg cc#aacacccg   7680ctgacgcgcc ctgacgggct tgtctgctcc cggcatccgc ttacagacaa gc#tgtgaccg   7740tctccgggag ctgcatgtgt cagaggtttt caccgtcatc accgaaacgc gc#gagacgaa   7800agggcctcgt gatacgccta tttttatagg ttaatgtcat gataataatg gt#ttcttaga   7860cgtcaggtgg cacttttcgg ggaaatgtgc gcggaacccc tatttgttta tt#tttctaaa   7920tacattcaaa tatgtatccg ctcatgagac aataaccctg ataaatgctt ca#ataatatt   7980gaaaaaggaa gagtatgagt attcaacatt tccgtgtcgc ccttattccc tt#ttttgcgg   8040cattttgcct tcctgttttt gctcacccag aaacgctggt gaaagtaaaa ga#tgctgaag   8100atcagttggg tgcacgagtg ggttacatcg aactggatct caacagcggt aa#gatccttg   8160agagttttcg ccccgaagaa cgttttccaa tgatgagcac ttttaaagtt ct#gctatgtg   8220gcgcggtatt atcccgtatt gacgccgggc aagagcaact cggtcgccgc at#acactatt   8280ctcagaatga cttggttgag tactcaccag tcacagaaaa gcatcttacg ga#tggcatga   8340cagtaagaga attatgcagt gctgccataa ccatgagtga taacactgcg gc#caacttac   8400ttctgacaac gatcggagga ccgaaggagc taaccgcttt tttgcacaac at#gggggatc   8460atgtaactcg ccttgatcgt tgggaaccgg agctgaatga agccatacca aa#cgacgagc   8520gtgacaccac gatgcctgta gcaatggcaa caacgttgcg caaactatta ac#tggcgaac   8580tacttactct agcttcccgg caacaattaa tagactggat ggaggcggat aa#agttgcag   8640gaccacttct gcgctcggcc cttccggctg gctggtttat tgctgataaa tc#tggagccg   8700gtgagcgtgg gtctcgcggt atcattgcag cactggggcc agatggtaag cc#ctcccgta   8760tcgtagttat ctacacgacg gggagtcagg caactatgga tgaacgaaat ag#acagatcg   8820ctgagatagg tgcctcactg attaagcatt ggtaactgtc agaccaagtt ta#ctcatata   8880tactttagat tgatttaaaa cttcattttt aatttaaaag gatctaggtg aa#gatccttt   8940ttgataatct catgaccaaa atcccttaac gtgagttttc gttccactga gc#gtcagacc   9000ccgtagaaaa gatcaaagga tcttcttgag atcctttttt tctgcgcgta at#ctgctgct   9060tgcaaacaaa aaaaccaccg ctaccagcgg tggtttgttt gccggatcaa ga#gctaccaa   9120ctctttttcc gaaggtaact ggcttcagca gagcgcagat accaaatact gt#ccttctag   9180tgtagccgta gttaggccac cacttcaaga actctgtagc accgcctaca ta#cctcgctc   9240tgctaatcct gttaccagtg gctgctgcca gtggcgataa gtcgtgtctt ac#cgggttgg   9300actcaagacg atagttaccg gataaggcgc agcggtcggg ctgaacgggg gg#ttcgtgca   9360cacagcccag cttggagcga acgacctaca ccgaactgag atacctacag cg#tgagctat   9420gagaaagcgc cacgcttccc gaagggagaa aggcggacag gtatccggta ag#cggcaggg   9480tcggaacagg agagcgcacg agggagcttc cagggggaaa cgcctggtat ct#ttatagtc   9540ctgtcgggtt tcgccacctc tgacttgagc gtcgattttt gtgatgctcg tc#aggggggc   9600ggagcctatg gaaaaacgcc agcaacgcgg cctttttacg gttcctggcc tt#ttgctggc   9660 cttttgctca catggctcga cagatct          #                   #           9687 <210> SEQ ID NO 3 <211> LENGTH: 83<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE:<223> OTHER INFORMATION: Description of Artificial #Sequence:  PCR primer       for RID <400> SEQUENCE: 3acgactcact ataggctagc cgccaccatg gcttacccat acgatgttcc ag#attacgct     60 gggcaaccag acatgtcctt gaa           #                   #                83 <210> SEQ ID NO 4<211> LENGTH: 80 <212> TYPE: DNA <213> ORGANISM: Artificial Sequence<220> FEATURE: <223> OTHER INFORMATION: Description of Artificial #Sequence:  PCR primer       for RID <400> SEQUENCE: 4ttgccaaaag acggcaatat ggtggaaaat aacgtgtcga ctctagaggt ac#cacgcgtg     60 ttagttctgg ctgacgtaaa             #                  #                   # 80 <210> SEQ ID NO 5 <211> LENGTH: 8614<212> TYPE: DNA <213> ORGANISM: Artificial Sequence <220> FEATURE:<223> OTHER INFORMATION: Description of Artificial  #Sequence:      constructed vectors <400> SEQUENCE: 5atcacgaggc cctttcgtct tcaagaacag ctttgctctt aggagtttcc ta#atacatcc     60caaactcaaa tatataaagc atttgacttg ttctatgccc tagttattaa ta#gtaatcaa    120ttacggggtc attagttcat agcccatata tggagttccg cgttacataa ct#tacggtaa    180atggcccgcc tggctgaccg cccaacgacc cccgcccatt gacgtcaata at#gacgtatg    240ttcccatagt aacgccaata gggactttcc attgacgtca atgggtggag ta#tttacggt    300aaactgccca cttggcagta catcaagtgt atcatatgcc aagtacgccc cc#tattgacg    360tcaatgacgg taaatggccc gcctggcatt atgcccagta catgacctta tg#ggactttc    420ctacttggca gtacatctac gtattagtca tcgctattac catggtgatg cg#gttttggc    480agtacatcaa tgggcgtgga tagcggtttg actcacgggg atttccaagt ct#ccacccca    540ttgacgtcaa tgggagtttg ttttggcacc aaaatcaacg ggactttcca aa#atgtcgta    600acaactccgc cccattgacg caaatgggcg gtaggcatgt acggtgggag gt#ctatataa    660gcagagctca ataaaagagc ccacaacccc tcactcgggg cgccagtcct cc#gattgact    720gagtcgcccg ggtacccgtg tatccaataa accctcttgc agttgcatcc ga#cttgtggt    780ctcgctgttc cttgggaggg tctcctctga gtgattgact acccgtcagc gg#gggtcttt    840catttggggg ctcgtccggg atcgggagac ccctgcccag ggaccaccga cc#caccaccg    900ggaggtaagc tggccagcaa cttatctgtg tctgtccgat tgtctagtgt ct#atgactga    960ttttatgcgc ctgcgtcggt actagttagc taactagctc tgtatctggc gg#acccgtgg   1020tggaactgac gagttcggaa cacccggccg caaccctggg agacgtccca gg#gacttcgg   1080gggccgtttt tgtggcccga cctgagtcca aaaatcccga tcgttttgga ct#ctttggtg   1140cacccccctt agaggaggga tatgtggttc tggtaggaga cgagaaccta aa#acagttcc   1200cgcctccgtc tgaatttttg ctttcggttt gggaccgaag ccgcgccgcg cg#tcttgtct   1260gctgcagcat cgttctgtgt tgtctctgtc tgactgtgtt tctgtatttg tc#tgaaaata   1320tcggcccggg ccagactgtt accactccct taagtttgac cttaggtcac tg#gaaagatg   1380tcgagcggat cgctcacaac cagtcggtag atgtcaagaa gagacgttgg gt#taccttct   1440gctctgcaga atggccaacc tttaacgtcg gatggccgcg agacggcacc tt#taaccgag   1500acctcatcac ccaggttaag atcaaggtct tttcacctgg cccgcatgga ca#cccagacc   1560aggtccccta catcgtgacc tgggaagcct tggcttttga cccccctccc tg#ggtcaagc   1620cctttgtaca ccctaagcct ccgcctcctc ttcctccatc cgccccgtct ct#cccccttg   1680aacctcctcg ttcgaccccg cctcgatcct ccctttatcc agccctcact cc#ttctctag   1740gcgcccccat atggccatat gagatcttat atggggcacc cccgcccctt gt#aaacttcc   1800ctgaccctga catgacaaga gttactaaca gcccctctct ccaagctcac tt#acaggctc   1860tctacttagt ccagcacgaa gtctggagac ctctggcggc agcctaccaa ga#acaactgg   1920accgaccggt ggtacctcac ccttaccgag tcggcgacac agtgtgggtc cg#ccgacacc   1980agactaagaa cctagaacct cgctggaaag gaccttacac agtcctgctg ac#caccccca   2040ccgccctcaa agtagacggc atcgcagctt ggatacacgc cgcccacgtg aa#ggctgccg   2100accccggggg tggaccatcc tctagactgc cggatctcga gggatccacc ac#catggacc   2160cccattaaat tggaattcct gcagcccggg ggatccacta gttctagagc ga#attaattc   2220cggttatttt ccaccatatt gccgtctttt ggcaatgtga gggcccggaa ac#ctggccct   2280gtcttcttga cgagcattcc taggggtctt tcccctctcg ccaaaggaat gc#aaggtctg   2340ttgaatgtcg tgaaggaagc agttcctctg gaagcttctt gaagacaaac aa#cgtctgta   2400gcgacccttt gcaggcagcg gaacccccca cctggcgaca ggtgcctctg cg#gccaaaag   2460ccacgtgtat aagatacacc tgcaaaggcg gcacaacccc agtgccacgt tg#tgagttgg   2520atagttgtgg aaagagtcaa atggctctcc tcaagcgtat tcaacaaggg gc#tgaaggat   2580gcccagaagg taccccattg tatgggatct gatctggggc ctcggtgcac at#gctttaca   2640tgtgtttagt cgaggttaaa aaacgtctag gccccccgaa ccacggggac gt#ggttttcc   2700tttgaaaaac acgatgataa tatgggggat ccaccggtcg ccaccatggt ga#gcaagggc   2760gaggagctgt tcaccggggt ggtgcccatc ctggtcgagc tggacggcga cg#taaacggc   2820cacaagttca gcgtgtccgg cgagggcgag ggcgatgcca cctacggcaa gc#tgaccctg   2880aagttcatct gcaccaccgg caagctgccc gtgccctggc ccaccctcgt ga#ccaccctg   2940acctacggcg tgcagtgctt cagccgctac cccgaccaca tgaagcagca cg#acttcttc   3000aagtccgcca tgcccgaagg ctacgtccag gagcgcacca tcttcttcaa gg#acgacggc   3060aactacaaga cccgcgccga ggtgaagttc gagggcgaca ccctggtgaa cc#gcatcgag   3120ctgaagggca tcgacttcaa ggaggacggc aacatcctgg ggcacaagct gg#agtacaac   3180tacaacagcc acaacgtcta tatcatggcc gacaagcaga agaacggcat ca#aggtgaac   3240ttcaagatcc gccacaacat cgaggacggc agcgtgcagc tcgccgacca ct#accagcag   3300aacaccccca tcggcgacgg ccccgtgctg ctgcccgaca accactacct ga#gcacccag   3360tccgccctga gcaaagaccc caacgagaag cgcgatcaca tggtcctgct gg#agttcgtg   3420accgccgccg ggatcactct cggcatggac gagctgtaca agtaaagcgg cc#gctcgacg   3480ataaaataaa agattttatt tagtctccag aaaaaggggg gaatgaaaga cc#ccacctgt   3540aggtttggca agctagctta agtaacgcca ttttgcaagg catggaaaaa ta#cataactg   3600agaatagaga agttcagatc aaggtcagga acagatggaa cagctgaata tg#ggccaaac   3660aggatatctg tggtaagcag ttcctgcccc ggctcagggc caagaacaga tg#gaacagct   3720gaatatgggc caaacaggat atctgtggta agcagttcct gccccggctc ag#ggccaaga   3780acagatggtc cccagatgcg gtccagccct cagcagtttc tagagaacca tc#agatgttt   3840ccagggtgcc ccaaggacct gaaatgaccc tgtgccttat ttgaactaac ca#atcagttc   3900gcttctcgct tctgttcgcg cgcttctgct ccccgagctc aataaaagag cc#cacaaccc   3960ctcactcggg gcgccagtcc tccgattgac tgagtcgccc gggtacccgt gt#atccaata   4020aaccctcttg cagttgcatc cgacttgtgg tctcgctgtt ccttgggagg gt#ctcctctg   4080agtgattgac tacccgtcag cgggggtctt tcatttccga cttgtggtct cg#ctgccttg   4140ggagggtctc ctctgagtga ttgactaccc gtcagcgggg gtcttcacat gc#agcatgta   4200tcaaaattaa tttggttttt tttcttaagt atttacatta aatggccata gt#gatcggca   4260agtgcacaaa caatacttaa ataaatacta ctcagtaata acctatttct ta#gcattttt   4320gacgaaattt gctattttgt tagagtcttt tacaccattt gtctccacac ct#ccgcttac   4380atcaacacca ataacgccat ttaatctaag cgcatcacca acattttctg gc#gtcagtcc   4440accagctaac ataaaatgta agctttcggg gctctcttgc cttccaaccc ag#tcagaaat   4500cgagttccaa tccaaaagtt cacctgtccc acctgcttct gaatcaaaca ag#ggaataaa   4560cgaatgaggt ttctgtgaag ctgcactgag tagtatgttg cagtcttttg ga#aatacgag   4620tcttttaata actggcaaac cgaggaactc ttggtattct tgccacgact ca#tctccatg   4680cagttggacg atatcaatgc cgtaatcatt gaccagagcc aaaacatcct cc#ttaggttg   4740attacgaaac acgccaacca agtatttcgg agtgcctgaa ctatttttat at#gcttttac   4800aagacttgaa attttccttg caataaccgg gtcaattgtt ctctttctat tg#ggcacaca   4860tataataccc agcaagtcag catcggaatc tagagcacat tctgcggcct ct#gtgctctg   4920caagccgcaa actttcacca atggaccaga actacctgtg aaattaataa ca#gacatact   4980ccaagctgcc tttgtgtgct taatcacgta tactcacgtg ctcaatagtc ac#caatgccc   5040tccctcttgg ccctctcctt ttcttttttc gaccgaatta attcttaatc gg#caaaaaaa   5100gaaaagctcc ggatcaagat tgtacgtaag gtgacaagct atttttcaat aa#agaatatc   5160ttccactact gccatctggc gtcataactg caaagtacac atatattacg at#gctgtcta   5220ttaaatgctt cctatattat atatatagta atgtcgttga tctatggtgc ac#tctcagta   5280caatctgctc tgatgccgca tagttaagcc agccccgaca cccgccaaca cc#cgctgacg   5340cgccctgacg ggcttgtctg ctcccggcat ccgcttacag acaagctgtg ac#cgtctccg   5400ggagctgcat gtgtcagagg ttttcaccgt catcaccgaa acgcgcgaga cg#aaagggcc   5460tcgtgatacg cctattttta taggttaatg tcatgataat aatggtttct ta#atatgatc   5520caatatcaaa ggaaatgata gcattgaagg atgagactaa tccaattgag ga#gtggcagc   5580atatagaaca gctaaagggt agtgctgaag gaagcatacg ataccccgca tg#gaatggga   5640taatatcaca ggaggtacta gactaccttt catcctacat aaatagacgc at#ataagtac   5700gcatttaagc ataaacacgc actatgccgt tcttctcatg tatatatata ta#caggcaac   5760acgcagatat aggtgcgacg tgaacagtga gctgtatgtg cgcagctcgc gt#tgcatttt   5820cggaagcgct cgttttcgga aacgctttga agttcctatt ccgaagttcc ta#ttctctag   5880aaagtatagg aacttcagag cgcttttgaa aaccaaaagc gctctgaaga cg#cactttca   5940aaaaaccaaa aacgcaccgg actgtaacga gctactaaaa tattgcgaat ac#cgcttcca   6000caaacattgc tcaaaagtat ctctttgcta tatatctctg tgctatatcc ct#atataacc   6060tacccatcca cctttcgctc cttgaacttg catctaaact cgacctctac at#tttttatg   6120tttatctcta gtattactct ttagacaaaa aaattgtagt aagaactatt ca#tagagtga   6180atcgaaaaca atacgaaaat gtaaacattt cctatacgta gtatatagag ac#aaaataga   6240agaaaccgtt cataattttc tgaccaatga agaatcatca acgctatcac tt#tctgttca   6300caaagtatgc gcaatccaca tcggtataga atataatcgg ggatgccttt at#cttgaaaa   6360aatgcacccg cagcttcgct agtaatcagt aaacgcggga agtggagtca gg#cttttttt   6420atggaagaga aaatagacac caaagtagcc ttcttctaac cttaacggac ct#acagtgca   6480aaaagttatc aagagactgc attatagagc gcacaaagga gaaaaaaagt aa#tctaagat   6540gctttgttag aaaaatagcg ctctcgggat gcatttttgt agaacaaaaa ag#aagtatag   6600attctttgtt ggtaaaatag cgctctcgcg ttgcatttct gttctgtaaa aa#tgcagctc   6660agattctttg tttgaaaaat tagcgctctc gcgttgcatt tttgttttac aa#aaatgaag   6720cacagattct tcgttggtaa aatagcgctt tcgcgttgca tttctgttct gt#aaaaatgc   6780agctcagatt ctttgtttga aaaattagcg ctctcgcgtt gcatttttgt tc#tacaaaat   6840gaagcacaga tgcttcgttc gagcaaaagg ccagcaaaag gccaggaacc gt#aaaaaggc   6900cgcgttgctg gcgtttttcc ataggctccg cccccctgac gagcatcaca aa#aatcgacg   6960ctcaagtcag aggtggcgaa acccgacagg actataaaga taccaggcgt tt#ccccctgg   7020aagctccctc gtgcgctctc ctgttccgac cctgccgctt accggatacc tg#tccgcctt   7080tctcccttcg ggaagcgtgg cgctttctca tagctcacgc tgtaggtatc tc#agttcggt   7140gtaggtcgtt cgctccaagc tgggctgtgt gcacgaaccc cccgttcagc cc#gaccgctg   7200cgccttatcc ggtaactatc gtcttgagtc caacccggta agacacgact ta#tcgccact   7260ggcagcagcc actggtaaca ggattagcag agcgaggtat gtaggcggtg ct#acagagtt   7320cttgaagtgg tggcctaact acggctacac tagaaggaca gtatttggta tc#tgcgctct   7380gctgaagcca gttaccttcg gaaaaagagt tggtagctct tgatccggca aa#caaaccac   7440cgctggtagc ggtggttttt ttgtttgcaa gcagcagatt acgcgcagaa aa#aaaggatc   7500tcaagaagat cctttgatct tttctacggg gtctgacgct cagtggaacg aa#aactcacg   7560ttaagggatt ttggtcatga gattatcaaa aaggatcttc acctagatcc tt#ttaaatta   7620aaaatgaagt ttgcgcaaat caatctaaag tatatatgag taaacttggt ct#gacagtta   7680ccaatgctta atcagtgagg cacctatctc agcgatctgt ctatttcgtt ca#tccatagt   7740tgcctgactc cccgtcgtgt agataactac gatacgggag ggcttaccat ct#ggccccag   7800tgctgcaatg ataccgcgag acccacgctc accggctcca gatttatcag ca#ataaacca   7860gccagccgga agggccgagc gcagaagtgg tcctgcaact ttatccgcct cc#atccagtc   7920tattaattgt tgccgggaag ctagagtaag tagttcgcca gttaatagtt tg#cgcaacgt   7980tgttgccatt gctacaggca tcgtggtgtc acgctcgtcg tttggtatgg ct#tcattcag   8040ctccggttcc caacgatcaa ggcgagttac atgatccccc atgttgtgca aa#aaagcggt   8100tagctccttc ggtcctccga tcgttgtcag aagtaagttg gccgcagtgt ta#tcactcat   8160ggttatggca gcactgcata attctcttac tgtcatgcca tccgtaagat gc#ttttctgt   8220gactggtgag tactcaacca agtcattctg agaatagtgt atgcggcgac cg#agttgctc   8280ttgcccggcg tcaacacggg ataataccgc gccacatagc agaactttaa aa#gtgctcat   8340cattggaaaa cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgt tg#agatccag   8400ttcgatgtaa cccactcgtg cacccaactg atcttcagca tcttttactt tc#accagcgt   8460ttctgggtga gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataa gg#gcgacacg   8520gaaatgttga atactcatac tcttcctttt tcaatattat tgaagcattt at#cagggtta   8580 ttgtctcatg acattaacct ataaaaatag gcgt       #                   #      8614 <210> SEQ ID NO 6 <211> LENGTH: 61<212> TYPE: PRT <213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  coiled-coil      presentation structure <300> PUBLICATION INFORMATION:<301> AUTHORS: Martin et al., <303> JOURNAL: EMBO J. <304> VOLUME: 13<305> ISSUE: 22 <306> PAGES: 5303-5309 <307> DATE: 1994<400> SEQUENCE: 6 Met Gly Cys Ala Ala Leu Glu Ser Glu Val Se#r Ala Leu Glu Ser Glu   1               5  #                 10 #                 15 Val Ala Ser Leu Glu Ser Glu Val Ala Ala Le#u Gly Arg Gly Asp Met              20      #             25     #             30 Pro Leu Ala Ala Val Lys Ser Lys Leu Ser Al#a Val Lys Ser Lys Leu          35          #         40         #         45 Ala Ser Val Lys Ser Lys Leu Ala Ala Cys Gl #y Pro Pro     50              #     55              #     60 <210> SEQ ID NO 7<211> LENGTH: 70 <212> TYPE: PRT <213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  minibody      presentation structure <400> SEQUENCE: 7Met Gly Arg Asn Ser Gln Ala Thr Ser Gly Ph #e Thr Phe Ser His Phe  1               5  #                 10  #                 15Tyr Met Glu Trp Val Arg Gly Gly Glu Tyr Il #e Ala Ala Ser Arg His             20      #             25      #             30Lys His Asn Lys Tyr Thr Thr Glu Tyr Ser Al #a Ser Val Lys Gly Arg         35          #         40          #         45Tyr Thr Ile Val Ser Arg Asp Thr Ser Gln Se #r Ile Leu Tyr Leu Gln     50              #     55              #     60Lys Lys Lys Gly Pro Pro  65                  # 70 <210> SEQ ID NO 8<211> LENGTH: 7 <212> TYPE: PRT <213> ORGANISM: Monkey virus<300> PUBLICATION INFORMATION: <301> AUTHORS: Kalderon,<303> JOURNAL: Cell <304> VOLUME: 39 <306> PAGES: 499-509<307> DATE: 1984 <400> SEQUENCE: 8 Pro Lys Lys Lys Arg Lys Val  1               5 <210> SEQ ID NO 9 <211> LENGTH: 6 <212> TYPE: PRT<213> ORGANISM: Homo sapiens <400> SEQUENCE: 9 Ala Arg Arg Arg Arg Pro  1               5 <210> SEQ ID NO 10 <211> LENGTH: 10 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  NLS<300> PUBLICATION INFORMATION: <301> AUTHORS: Ghosh et al.,<303> JOURNAL: Cell <304> VOLUME: 62 <306> PAGES: 1019-1019<307> DATE: 1990 <400> SEQUENCE: 10Glu Glu Val Gln Arg Lys Arg Gln Lys Leu   1               5 #                 10 <210> SEQ ID NO 11 <211> LENGTH: 9 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  NLS<300> PUBLICATION INFORMATION: <303> JOURNAL: J. Cell. Biochem.<304> VOLUME: 55 <305> ISSUE: 1 <306> PAGES: 32-58 <307> DATE: 1994<300> PUBLICATION INFORMATION: <301> AUTHORS: Nolan et al.,<303> JOURNAL: Cell <304> VOLUME: 64 <306> PAGES: 961-961<307> DATE: 1991 <400> SEQUENCE: 11 Glu Glu Lys Arg Lys Arg Thr Tyr Glu  1               5 <210> SEQ ID NO 12 <211> LENGTH: 20 <212> TYPE: PRT<213> ORGANISM: African clawed toad <300> PUBLICATION INFORMATION:<301> AUTHORS: Dingwall et al., <303> JOURNAL: J. Cell Biol.<304> VOLUME: 30 <306> PAGES: 449-458 <307> DATE: 1988<300> PUBLICATION INFORMATION: <301> AUTHORS: Dingwall et al.,<303> JOURNAL: Cell <304> VOLUME: 30 <306> PAGES: 449-458<307> DATE: 1982 <400> SEQUENCE: 12Ala Val Lys Arg Pro Ala Ala Thr Lys Lys Al #a Gly Gln Ala Lys Lys  1               5  #                 10  #                 15Lys Lys Leu Asp              20 <210> SEQ ID NO 13 <211> LENGTH: 31<212> TYPE: PRT <213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  signal      sequence <300> PUBLICATION INFORMATION:<301> AUTHORS: Nakauchi et al.,<303> JOURNAL: Proc. Natl. Acad. Sci. U.S.A. <304> VOLUME: 82<306> PAGES: 5126-5126 <307> DATE: 1985 <400> SEQUENCE: 13Met Ala Ser Pro Leu Thr Arg Phe Leu Ser Le #u Asn Leu Leu Leu Leu  1               5  #                 10  #                 15Gly Glu Ser Ile Leu Gly Ser Gly Glu Ala Ly #s Pro Gln Ala Pro             20      #             25      #             30<210> SEQ ID NO 14 <211> LENGTH: 21 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  signal      sequence <300> PUBLICATION INFORMATION:<301> AUTHORS: Staunton et al., <303> JOURNAL: Nature <304> VOLUME: 339<306> PAGES: 61-61 <307> DATE: 1989 <400> SEQUENCE: 14Met Ser Ser Phe Gly Tyr Arg Thr Leu Thr Va #l Ala Leu Phe Thr Leu  1               5  #                 10  #                 15Ile Cys Cys Pro Gly              20 <210> SEQ ID NO 15 <211> LENGTH: 51<212> TYPE: PRT <213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or#ganism:  transmembrane       domains <300> PUBLICATION INFORMATION:<301> AUTHORS: Nakauchi et al.,<303> JOURNAL: Proc. Natl. Acad. Sci. U.S.A. <304> VOLUME: 82<306> PAGES: 5126-5126 <307> DATE: 1985 <400> SEQUENCE: 15Pro Gln Arg Pro Glu Asp Cys Arg Pro Arg Gl #y Ser Val Lys Gly Thr  1               5  #                 10  #                 15Gly Leu Asp Phe Ala Cys Asp Ile Tyr Ile Tr #p Ala Pro Leu Ala Gly             20      #             25      #             30Ile Cys Val Ala Leu Leu Leu Ser Leu Ile Il #e Thr Leu Ile Cys Tyr         35          #         40          #         45 His Ser Arg     50 <210> SEQ ID NO 16 <211> LENGTH: 33 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or#ganism:  transmembrane       domain <300> PUBLICATION INFORMATION:<301> AUTHORS: Staunton et al., <303> JOURNAL: Nature <304> VOLUME: 339<306> PAGES: 61-61 <307> DATE: 1989 <400> SEQUENCE: 16Met Val Ile Ile Val Thr Val Val Ser Val Le #u Leu Ser Leu Phe Val  1               5  #                 10  #                 15Thr Ser Val Leu Leu Cys Phe Ile Phe Gly Gl #n His Leu Arg Gln Gln             20      #             25      #             30 Arg<210> SEQ ID NO 17 <211> LENGTH: 37 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  anchor site<300> PUBLICATION INFORMATION: <303> JOURNAL: Nature <304> VOLUME: 333<305> ISSUE: 6170 <306> PAGES: 269-272 <307> DATE: 1988<300> PUBLICATION INFORMATION: <303> JOURNAL: J. Biol. Chem.<304> VOLUME: 266 <306> PAGES: 1250-1250 <307> DATE: 1991<400> SEQUENCE: 17 Pro Asn Lys Gly Ser Gly Thr Thr Ser Gly Th#r Thr Arg Leu Leu Ser   1               5  #                 10 #                 15 Gly His Thr Cys Phe Thr Leu Thr Gly Leu Le#u Gly Thr Leu Val Thr              20      #             25     #             30 Met Gly Leu Leu Thr          35 <210> SEQ ID NO 18<211> LENGTH: 14 <212> TYPE: PRT <213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or#ganism:  myristylation       sequence <300> PUBLICATION INFORMATION:<303> JOURNAL: Mol. Cell. Biol. <304> VOLUME: 4 <305> ISSUE: 9<306> PAGES: 1834-1834 <307> DATE: 1984 <300> PUBLICATION INFORMATION:<303> JOURNAL: Science <304> VOLUME: 262 <306> PAGES: 1019-1024<307> DATE: 1993 <400> SEQUENCE: 18Met Gly Ser Ser Lys Ser Lys Pro Lys Asp Pr #o Ser Gln Arg  1               5  #                 10 <210> SEQ ID NO 19<211> LENGTH: 26 <212> TYPE: PRT <213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or#ganism:  palmitolyated       sequence <300> PUBLICATION INFORMATION:<303> JOURNAL: J. Biol. Chem. <304> VOLUME: 269 <306> PAGES: 27791-27791<307> DATE: 1994 <400> SEQUENCE: 19Leu Leu Gln Arg Leu Phe Ser Arg Gln Asp Cy #s Cys Gly Asn Cys Ser  1               5  #                 10  #                 15Asp Ser Glu Glu Glu Leu Pro Thr Arg Leu              20     #             25 <210> SEQ ID NO 20 <211> LENGTH: 20 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or#ganism:  palmitolyated       sequence <300> PUBLICATION INFORMATION:<303> JOURNAL: J. Mol. Neurosci. <304> VOLUME: 5 <305> ISSUE: 3<306> PAGES: 207-207 <307> DATE: 1994 <400> SEQUENCE: 20Lys Gln Phe Arg Asn Cys Met Leu Thr Ser Le #u Cys Cys Gly Lys Asn  1               5  #                 10  #                 15Pro Leu Gly Asp              20 <210> SEQ ID NO 21 <211> LENGTH: 19<212> TYPE: PRT <213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or#ganism:  pamitolyated       sequence <300> PUBLICATION INFORMATION:<303> JOURNAL: Nature <304> VOLUME: 302 <306> PAGES: 33-33<307> DATE: 1983 <400> SEQUENCE: 21Leu Asn Pro Pro Asp Glu Ser Gly Pro Gly Cy #s Met Ser Cys Lys Cys  1               5  #                 10  #                 15Val Leu Ser <210> SEQ ID NO 22 <211> LENGTH: 5 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  lysosomal      degradation sequence <300> PUBLICATION INFORMATION:<303> JOURNAL: Ann. N. Y. Acad. Sci. <304> VOLUME: 674<306> PAGES: 58-58 <307> DATE: 1992 <400> SEQUENCE: 22Lys Phe Glu Arg Gln   1               5 <210> SEQ ID NO 23<211> LENGTH: 36 <212> TYPE: PRT <213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  lysosomal      membrane sequence <300> PUBLICATION INFORMATION:<303> JOURNAL: Cell. Mol. Biol. Res. <304> VOLUME: 41<306> PAGES: 405-405 <307> DATE: 1995 <400> SEQUENCE: 23Met Leu Ile Pro Ile Ala Gly Phe Phe Ala Le #u Ala Gly Leu Val Leu  1               5  #                 10  #                 15Ile Val Leu Ile Ala Tyr Leu Ile Gly Arg Ly #s Arg Ser His Ala Gly             20      #             25      #             30Tyr Gln Thr Ile          35 <210> SEQ ID NO 24 <211> LENGTH: 35<212> TYPE: PRT <213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  lysosomal      membrane sequence <300> PUBLICATION INFORMATION:<303> JOURNAL: Biochem. Biophys. Res. Commun. <304> VOLUME: 205<306> PAGES: 1-5 <307> DATE: 1994 <400> SEQUENCE: 24Leu Val Pro Ile Ala Val Gly Ala Ala Leu Al #a Gly Val Leu Ile Leu  1               5  #                 10  #                 15Val Leu Leu Ala Tyr Phe Ile Gly Leu Lys Hi #s His His Ala Gly Tyr             20      #             25      #             30 Glu Gln Phe         35 <210> SEQ ID NO 25 <211> LENGTH: 27 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:      mitrochondrial localization sequence<300> PUBLICATION INFORMATION: <303> JOURNAL: Eur. J. Biochem.<304> VOLUME: 165 <306> PAGES: 1-6 <307> DATE: 1987 <400> SEQUENCE: 25Met Leu Arg Thr Ser Ser Leu Phe Thr Arg Ar #g Val Gln Pro Ser Leu  1               5  #                 10  #                 15Phe Ser Arg Asn Ile Leu Arg Leu Gln Ser Th #r              20     #             25 <210> SEQ ID NO 26 <211> LENGTH: 25 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:      mictrochondrial localization sequence<300> PUBLICATION INFORMATION: <303> JOURNAL: Eur. J. Biochem.<304> VOLUME: 165 <306> PAGES: 1-6 <307> DATE: 1987 <400> SEQUENCE: 26Met Leu Ser Leu Arg Gln Ser Ile Arg Phe Ph #e Lys Pro Ala Thr Arg  1               5  #                 10  #                 15Thr Leu Cys Ser Ser Arg Tyr Leu Leu              20     #             25 <210> SEQ ID NO 27 <211> LENGTH: 64 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:      mitrochondrial localization sequence<300> PUBLICATION INFORMATION: <303> JOURNAL: Eur. J. Biochem.<304> VOLUME: 165 <306> PAGES: 1-6 <307> DATE: 1987 <400> SEQUENCE: 27Met Phe Ser Met Leu Ser Lys Arg Trp Ala Gl #n Arg Thr Leu Ser Lys  1               5  #                 10  #                 15Ser Phe Tyr Ser Thr Ala Thr Gly Ala Ala Se #r Lys Ser Gly Lys Leu             20      #             25      #             30Thr Gln Lys Leu Val Thr Ala Gly Val Ala Al #a Ala Gly Ile Thr Ala         35          #         40          #         45Ser Thr Leu Leu Tyr Ala Asp Ser Leu Thr Al #a Glu Ala Met Thr Ala     50              #     55              #     60 <210> SEQ ID NO 28<211> LENGTH: 41 <212> TYPE: PRT <213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:      mitrochondrial localization sequence<300> PUBLICATION INFORMATION: <303> JOURNAL: Eur. J. Biochem.<304> VOLUME: 165 <306> PAGES: 1-6 <307> DATE: 1987 <400> SEQUENCE: 28Met Lys Ser Phe Ile Thr Arg Asn Lys Thr Al #a Ile Leu Ala Thr Val  1               5  #                 10  #                 15Ala Ala Thr Gly Thr Ala Ile Gly Ala Tyr Ty #r Tyr Tyr Asn Gln Leu             20      #             25      #             30Gln Gln Gln Gln Gln Arg Gly Lys Lys          35          #         40<210> SEQ ID NO 29 <211> LENGTH: 4 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  endoplasmic      reticulum sequence <300> PUBLICATION INFORMATION:<303> JOURNAL: Royal Society London Transactions B <304> VOLUME: B<306> PAGES: 1-10 <307> DATE: 1992 <400> SEQUENCE: 29 Lys Asp Glu Leu  1 <210> SEQ ID NO 30 <211> LENGTH: 15 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  adenovirus<300> PUBLICATION INFORMATION: <303> JOURNAL: EMBO J. <304> VOLUME: 9<306> PAGES: 3153-3153 <307> DATE: 1990 <400> SEQUENCE: 30Leu Tyr Leu Ser Arg Arg Ser Phe Ile Asp Gl #u Lys Lys Met Pro  1               5  #                 10  #                 15<210> SEQ ID NO 31 <211> LENGTH: 19 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:      farnesylation sequence <300> PUBLICATION INFORMATION:<303> JOURNAL: Nature <304> VOLUME: 302 <306> PAGES: 33-33<307> DATE: 1983 <400> SEQUENCE: 31Leu Asn Pro Pro Asp Glu Ser Gly Pro Gly Cy #s Met Ser Cys Lys Cys  1               5  #                 10  #                 15Val Leu Ser <210> SEQ ID NO 32 <211> LENGTH: 3 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  peroxisome      matrix sequence <300> PUBLICATION INFORMATION:<303> JOURNAL: Proc. Natl. Acad. Sci. U.S.A. <304> VOLUME: 4<306> PAGES: 3264-3264 <307> DATE: 1987 <400> SEQUENCE: 32 Ser Lys Leu  1 <210> SEQ ID NO 33 <211> LENGTH: 15 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:      geranylgeranylation sequence <300> PUBLICATION INFORMATION:<303> JOURNAL: Proc. Natl. Acad. Sci. U.S.A. <304> VOLUME: 91<306> PAGES: 11963-11963 <307> DATE: 1994 <400> SEQUENCE: 33Leu Thr Glu Pro Thr Gln Pro Thr Arg Asn Gl #n Cys Cys Ser Asn  1               5  #                 10  #                 15<210> SEQ ID NO 34 <211> LENGTH: 9 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  destruction      sequence <300> PUBLICATION INFORMATION: <303> JOURNAL: EMBO J.<304> VOLUME: 1 <306> PAGES: 3053-3053 <307> DATE: 1996<400> SEQUENCE: 34 Arg Thr Ala Leu Gly Asp Ile Gly Asn  1               5 <210> SEQ ID NO 35 <211> LENGTH: 20 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  secretory      sequence <300> PUBLICATION INFORMATION: <303> JOURNAL: J. Immunol.<304> VOLUME: 155 <306> PAGES: 3946-3946 <307> DATE: 1995<400> SEQUENCE: 35 Met Tyr Arg Met Gln Leu Leu Ser Cys Ile Al#a Leu Ser Leu Ala Leu   1               5  #                 10 #                 15 Val Thr Asn Ser              20 <210> SEQ ID NO 36<211> LENGTH: 29 <212> TYPE: PRT <213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  secretory      sequence <300> PUBLICATION INFORMATION:<303> JOURNAL: Nucleic Acids Res. <304> VOLUME: 7 <306> PAGES: 30-30<307> DATE: 1979 <400> SEQUENCE: 36Met Ala Thr Gly Ser Arg Thr Ser Leu Leu Le #u Ala Phe Gly Leu Leu  1               5  #                 10  #                 15Cys Leu Pro Trp Leu Gln Glu Gly Ser Ala Ph #e Pro Thr             20      #             25 <210> SEQ ID NO 37<211> LENGTH: 27 <212> TYPE: PRT <213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  secretory      sequence <300> PUBLICATION INFORMATION: <303> JOURNAL: Nature<304> VOLUME: 284 <306> PAGES: 26-26 <307> DATE: 1980 <400> SEQUENCE: 37Met Ala Leu Trp Met Arg Leu Leu Pro Leu Le #u Ala Leu Leu Ala Leu  1               5  #                 10  #                 15Trp Gly Pro Asp Pro Ala Ala Ala Phe Val As #n              20     #             25 <210> SEQ ID NO 38 <211> LENGTH: 18 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  secretory      sequence <300> PUBLICATION INFORMATION:<303> JOURNAL: Proc. Natl. Acad. Sci. U.S.A. <304> VOLUME: 80<306> PAGES: 3563-3563 <400> SEQUENCE: 38Met Lys Ala Lys Leu Leu Val Leu Leu Tyr Al #a Phe Val Ala Gly Asp  1               5  #                 10  #                 15 Gln Ile<210> SEQ ID NO 39 <211> LENGTH: 24 <212> TYPE: PRT<213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  secretory      signal sequence <400> SEQUENCE: 39Met Gly Leu Thr Ser Gln Leu Leu Pro Pro Le #u Phe Phe Leu Leu Ala  1               5  #                 10  #                 15Cys Ala Gly Asn Phe Val His Gly              20 <210> SEQ ID NO 40<211> LENGTH: 7 <212> TYPE: PRT <213> ORGANISM: Unknown <220> FEATURE:<223> OTHER INFORMATION: Description of Unknown Or #ganism:  stability      sequence <220> FEATURE: <221> NAME/KEY: UNSURE<222> LOCATION: (3)..(4) <223> OTHER INFORMATION: The amino acid at posi#tion 3 is any amino acid. <400> SEQUENCE: 40Met Gly Xaa Gly Gly Pro Pro   1               5 <210> SEQ ID NO 41<211> LENGTH: 2 <212> TYPE: PRT <213> ORGANISM: Artificial Sequence<220> FEATURE: <223> OTHER INFORMATION: Description of Artificial #Sequence:  linker       consensus <400> SEQUENCE: 41 Gly Ser   1<210> SEQ ID NO 42 <211> LENGTH: 5 <212> TYPE: PRT<213> ORGANISM: Artificial Sequence <220> FEATURE:<223> OTHER INFORMATION: Description of Artificial  #Sequence:  linkder      consensus <400> SEQUENCE: 42 Gly Ser Gly Gly Ser  1               5 <210> SEQ ID NO 43 <211> LENGTH: 4 <212> TYPE: PRT<213> ORGANISM: Artificial Sequence <220> FEATURE:<223> OTHER INFORMATION: Description of Artificial  #Sequence:  linker      consensus <400> SEQUENCE: 43 Gly Gly Gly Ser   1

We claim:
 1. A method for producing an expression shuttle vectorcomprising a heterologous linear nucleic acid insert and capable ofexpressing said insert in a mammalian cell, comprising: (a) transformingyeast with a shuttle vector which shuttle vector comprises: (i) anorigin of replication functional in yeast; (ii) a selectable genefunctional in yeast; (iii) a promoter functional in a mammalian cell andcapable of directing transcription of a polypeptide coding sequenceoperably linked downstream of said promoter; and (iv) an insertion sitefor an heterologous nucleic acid; wherein said insertion site is anhomologous recombination site comprising a first nucleic acid sequenceand a second nucleic acid sequence, which first and second nucleic acidsequences are contiguous, and wherein said first and second nucleic acidsequences taken separately correspond to a nucleic acid sequence at the5′ end of said heterologous nucleic acid and a nucleic acid sequence atthe 3′ end of said heterologous nucleic acid, respectively, and whereinsaid first and second nucleic acid sequences taken separately comprisenucleic acid sequences of from about 10 to about 100 nucleotides inlength; (b) transforming yeast with a vector comprising an heterologousnucleic acid flanked by said first nucleic acid sequence and said secondnucleic acid sequence; and (c) allowing said shuttle vector to recombineso as to insert said heterologous nucleic acid into said shuttle vectorat said homologous recombination site.
 2. The method according to claim1, wherein said linear nucleic acid is a PCR product.
 3. The methodaccording to claim 2, wherein said PCR product is produced using primerscomprising said first nucleic acid sequence, or the complement thereof,and said second nucleic acid sequence, or the complement thereof.
 4. Ayeast cell comprising an expression shuttle vector, wherein the vectorcomprises: (i) an origin of replication functional in yeast; (ii) aselectable gene functional in yeast; (iii) a promoter functional in amammalian cell and capable of directing transcription of a polypeptidecoding sequence operably linked downstream of said promoter; and (iv) aninsertion site for an heterologous nucleic acid; wherein said insertionsite is an homologous recombination site comprising a first nucleic acidsequence and a second nucleic acid sequence, which first and secondnucleic acid sequences are contiguous, and wherein said first and secondnucleic acid sequences taken separately correspond to a nucleic acidsequence at the 5′ end of said heterologous nucleic acid and a nucleicacid sequence at the 3′ end of said heterologous nucleic acid,respectively, and wherein said first and second nucleic acid sequencestaken separately comprise nucleic acid sequences of from about 10 toabout 100 nucleotides in length; wherein the yeast cell furthercomprises a vector comprising a heterologous nucleic acid moleculeflanked by the first nucleic acid sequence and the second nucleic acidsequence; such that the shuttle vector recombines to insert theheterologous nucleic acid into the shuttle vector at the homologousrecombination site.